rapidsai / cudf

cuDF - GPU DataFrame Library
https://docs.rapids.ai/api/cudf/stable/
Apache License 2.0
8.26k stars 884 forks source link

[BUG] Multi-column sorting on dask-cudf dataframe with nulls gives incorrect ordering #9255

Closed charlesbluca closed 2 years ago

charlesbluca commented 2 years ago

Describe the bug When doing multi-column sorting with a dask-cudf dataframe containing nulls, the ordering is incorrect:

import cudf
import dask_cudf

gdf = cudf.DataFrame(
    {
        'a': list(range(15)) + [None] * 5, 
        'b': list(reversed(range(20)))
    }
)
gddf = dask_cudf.from_cudf(gdf, npartitions=5)

gddf.sort_values(by=["a", "b"]).compute()
       a   b
3      3  16
4      4  15
5      5  14
6      6  13
0      0  19
1      1  18
2      2  17
7      7  12
8      8  11
9      9  10
10    10   9
11    11   8
12    12   7
13    13   6
14    14   5
19  <NA>   0
18  <NA>   1
17  <NA>   2
16  <NA>   3
15  <NA>   4

Expected behavior The proper ordering:

gdf.sort_values(by=["a", "b"])
       a   b
0      0  19
1      1  18
2      2  17
3      3  16
4      4  15
5      5  14
6      6  13
7      7  12
8      8  11
9      9  10
10    10   9
11    11   8
12    12   7
13    13   6
14    14   5
19  <NA>   0
18  <NA>   1
17  <NA>   2
16  <NA>   3
15  <NA>   4

Environment overview (please complete the following information)

Environment details

Click here to see environment details

     **git***
     commit 079af458b55ea83d72293ddf5c2060c0b77d935f (HEAD -> branch-21.12, tag: v21.12.00a, upstream/branch-21.12, origin/branch-21.12)
     Author: Raymond Douglass 
     Date:   Thu Sep 16 16:47:59 2021 -0400

     DOC v21.12 Updates
     **git submodules***

     ***OS Information***
     DISTRIB_ID=Ubuntu
     DISTRIB_RELEASE=18.04
     DISTRIB_CODENAME=bionic
     DISTRIB_DESCRIPTION="Ubuntu 18.04.5 LTS"
     NAME="Ubuntu"
     VERSION="18.04.5 LTS (Bionic Beaver)"
     ID=ubuntu
     ID_LIKE=debian
     PRETTY_NAME="Ubuntu 18.04.5 LTS"
     VERSION_ID="18.04"
     HOME_URL="https://www.ubuntu.com/"
     SUPPORT_URL="https://help.ubuntu.com/"
     BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
     PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
     VERSION_CODENAME=bionic
     UBUNTU_CODENAME=bionic
     Linux docker-desktop 5.10.16.3-microsoft-standard-WSL2 #1 SMP Fri Apr 2 22:23:49 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

     ***GPU Information***
./print_env.sh: line 25: nvidia-smi: command not found

     ***CPU***
     Architecture:        x86_64
     CPU op-mode(s):      32-bit, 64-bit
     Byte Order:          Little Endian
     CPU(s):              12
     On-line CPU(s) list: 0-11
     Thread(s) per core:  2
     Core(s) per socket:  6
     Socket(s):           1
     Vendor ID:           GenuineIntel
     CPU family:          6
     Model:               85
     Model name:          Intel(R) Xeon(R) Gold 6128 CPU @ 3.40GHz
     Stepping:            4
     CPU MHz:             3391.498
     BogoMIPS:            6782.99
     Virtualization:      VT-x
     Hypervisor vendor:   Microsoft
     Virtualization type: full
     L1d cache:           32K
     L1i cache:           32K
     L2 cache:            1024K
     L3 cache:            19712K
     Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology tsc_reliable nonstop_tsc cpuid pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves flush_l1d arch_capabilities

     ***CMake***
     /home/charlesbluca/dev/rapids/compose/etc/conda/cuda_11.2/envs/rapids/bin/cmake
     cmake version 3.21.2

     CMake suite maintained and supported by Kitware (kitware.com/cmake).

     ***g++***
     /usr/local/bin/g++
     g++ (Ubuntu 9.4.0-1ubuntu1~18.04) 9.4.0
     Copyright (C) 2019 Free Software Foundation, Inc.
     This is free software; see the source for copying conditions.  There is NO
     warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

     ***nvcc***
     /usr/local/bin/nvcc
     nvcc: NVIDIA (R) Cuda compiler driver
     Copyright (c) 2005-2021 NVIDIA Corporation
     Built on Sun_Feb_14_21:12:58_PST_2021
     Cuda compilation tools, release 11.2, V11.2.152
     Build cuda_11.2.r11.2/compiler.29618528_0

     ***Python***
     /home/charlesbluca/dev/rapids/compose/etc/conda/cuda_11.2/envs/rapids/bin/python
     Python 3.8.12

     ***Environment Variables***
     PATH                            : /home/charlesbluca/dev/rapids/compose/etc/conda/cuda_11.2/envs/rapids/bin:/home/charlesbluca/dev/rapids/compose/etc/conda/cuda_11.2/condabin:/home/charlesbluca/dev/rapids/compose/etc/conda/cuda_11.2/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/local/cuda/bin
     LD_LIBRARY_PATH                 : /home/charlesbluca/dev/rapids/compose/etc/conda/cuda_11.2/envs/rapids/lib:/home/charlesbluca/dev/rapids/compose/etc/conda/cuda_11.2/lib:/usr/lib/x86_64-linux-gnu:/usr/lib/i386-linux-gnu:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/local/cuda/lib64:/usr/local/lib:/home/charlesbluca/dev/rapids/rmm/build/release:/home/charlesbluca/dev/rapids/cudf/cpp/build/release:/home/charlesbluca/dev/rapids/raft/cpp/build/release:/home/charlesbluca/dev/rapids/cuml/cpp/build/release:/home/charlesbluca/dev/rapids/cugraph/cpp/build/release:/home/charlesbluca/dev/rapids/cuspatial/cpp/build/release
     NUMBAPRO_NVVM                   :
     NUMBAPRO_LIBDEVICE              :
     CONDA_PREFIX                    : /home/charlesbluca/dev/rapids/compose/etc/conda/cuda_11.2/envs/rapids
     PYTHON_PATH                     :

     ***conda packages***
     /home/charlesbluca/dev/rapids/compose/etc/conda/cuda_11.2/condabin/conda
     # packages in environment at /home/charlesbluca/dev/rapids/compose/etc/conda/cuda_11.2/envs/rapids:
     #
     # Name                    Version                   Build  Channel
     _libgcc_mutex             0.1                 conda_forge    conda-forge
     _openmp_mutex             4.5                       1_gnu    conda-forge
     abseil-cpp                20210324.2           h9c3ff4c_0    conda-forge
     adagio                    0.2.3              pyhd8ed1ab_0    conda-forge
     alabaster                 0.7.12                     py_0    conda-forge
     alembic                   1.4.1                      py_0    conda-forge
     alsa-lib                  1.2.3                h516909a_0    conda-forge
     antlr4-python3-runtime    4.9.2              pyhd8ed1ab_0    conda-forge
     appdirs                   1.4.4              pyh9f0ad1d_0    conda-forge
     argon2-cffi               20.1.0           py38h497a2fe_2    conda-forge
     arrow-cpp                 5.0.0           py38hdc1b314_6_cuda    conda-forge
     arrow-cpp-proc            3.0.0                      cuda    conda-forge
     asgiref                   3.4.1              pyhd8ed1ab_0    conda-forge
     asn1crypto                1.4.0              pyh9f0ad1d_0    conda-forge
     async_generator           1.10                       py_0    conda-forge
     attrs                     21.2.0             pyhd8ed1ab_0    conda-forge
     autoconf                  2.69            pl5320h36c2ea0_10    conda-forge
     automake                  1.16.2          pl5320ha770c72_3    conda-forge
     aws-c-cal                 0.5.11               h95a6274_0    conda-forge
     aws-c-common              0.6.2                h7f98852_0    conda-forge
     aws-c-event-stream        0.2.7               h3541f99_13    conda-forge
     aws-c-io                  0.10.5               hfb6a706_0    conda-forge
     aws-checksums             0.1.11               ha31a3da_7    conda-forge
     aws-sdk-cpp               1.8.186              hb4091e7_3    conda-forge
     babel                     2.9.1              pyh44b312d_0    conda-forge
     backcall                  0.2.0              pyh9f0ad1d_0    conda-forge
     backports                 1.0                        py_2    conda-forge
     backports.functools_lru_cache 1.6.4              pyhd8ed1ab_0    conda-forge
     backports.zoneinfo        0.2.1            py38h497a2fe_4    conda-forge
     beautifulsoup4            4.10.0             pyha770c72_0    conda-forge
     binutils_impl_linux-64    2.36.1               h193b22a_2    conda-forge
     black                     19.10b0                  py38_0    conda-forge
     bleach                    4.1.0              pyhd8ed1ab_0    conda-forge
     bokeh                     2.3.3            py38h578d9bd_0    conda-forge
     brotlipy                  0.7.0           py38h497a2fe_1001    conda-forge
     bzip2                     1.0.8                h7f98852_4    conda-forge
     c-ares                    1.17.2               h7f98852_0    conda-forge
     ca-certificates           2021.5.30            ha878542_0    conda-forge
     cachetools                4.2.2              pyhd8ed1ab_0    conda-forge
     cairo                     1.16.0            h6cf1ce9_1008    conda-forge
     certifi                   2021.5.30        py38h578d9bd_0    conda-forge
     cffi                      1.14.6           py38h3931269_1    conda-forge
     cfgv                      3.3.1              pyhd8ed1ab_0    conda-forge
     chardet                   4.0.0            py38h578d9bd_1    conda-forge
     charset-normalizer        2.0.0              pyhd8ed1ab_0    conda-forge
     ciso8601                  2.2.0            py38h497a2fe_0    conda-forge
     clang                     11.0.0               ha770c72_2    conda-forge
     clang-11                  11.0.0          default_ha5c780c_2    conda-forge
     clang-tools               11.0.0          default_ha5c780c_2    conda-forge
     clangxx                   11.0.0          default_ha5c780c_2    conda-forge
     click                     8.0.1            py38h578d9bd_0    conda-forge
     cloudpickle               2.0.0              pyhd8ed1ab_0    conda-forge
     cmake                     3.21.2               h8897547_0    conda-forge
     cmake-format              0.6.11             pyh9f0ad1d_0    conda-forge
     cmake_setuptools          0.1.3                      py_0    rapidsai
     colorama                  0.4.4              pyh9f0ad1d_0    conda-forge
     commonmark                0.9.1                      py_0    conda-forge
     configparser              5.0.2              pyhd8ed1ab_0    conda-forge
     coverage                  5.5              py38h497a2fe_0    conda-forge
     cryptography              3.4.7            py38ha5dfef3_0    conda-forge
     cudatoolkit               11.2.72              h2bc3f7f_0    nvidia
     cupy                      9.4.0            py38h7818112_0    conda-forge
     cyrus-sasl                2.1.27               h230043b_3    conda-forge
     cython                    0.29.24          py38h709712a_0    conda-forge
     cytoolz                   0.11.0           py38h497a2fe_3    conda-forge
     dask                      2021.9.0+18.g9fc5777f           dev_0    
     dask-cuda                 21.10.0a0+40.gaf0e678           dev_0    
     dask-glm                  0.2.0                      py_1    conda-forge
     dask-ml                   1.9.0              pyhd8ed1ab_0    conda-forge
     dask-sql                  0.3.10.dev18+gce9821a           dev_0    
     databricks-cli            0.12.1             pyhd8ed1ab_0    conda-forge
     dataclasses               0.8                pyhc8e2a94_3    conda-forge
     deap                      1.3.1            py38h1abd341_2    conda-forge
     debugpy                   1.4.1            py38h709712a_0    conda-forge
     decorator                 5.1.0              pyhd8ed1ab_0    conda-forge
     defusedxml                0.7.1              pyhd8ed1ab_0    conda-forge
     distlib                   0.3.2              pyhd8ed1ab_0    conda-forge
     distributed               2021.9.0+18.g05677bb2           dev_0    
     dlpack                    0.5                  h9c3ff4c_0    conda-forge
     docker-py                 5.0.2            py38h578d9bd_0    conda-forge
     docker-pycreds            0.4.0                      py_0    conda-forge
     docutils                  0.16             py38h578d9bd_3    conda-forge
     double-conversion         3.1.5                h9c3ff4c_2    conda-forge
     editdistance-s            1.0.0            py38h1fd1430_1    conda-forge
     entrypoints               0.3             py38h32f6830_1002    conda-forge
     execnet                   1.9.0              pyhd8ed1ab_0    conda-forge
     expat                     2.4.1                h9c3ff4c_0    conda-forge
     fastapi                   0.68.1             pyhd8ed1ab_0    conda-forge
     fastavro                  1.4.4            py38h497a2fe_0    conda-forge
     fastrlock                 0.6              py38h709712a_1    conda-forge
     filelock                  3.0.12             pyh9f0ad1d_0    conda-forge
     flake8                    3.8.3                      py_1    conda-forge
     flask                     2.0.1              pyhd8ed1ab_0    conda-forge
     fontconfig                2.13.1            hba837de_1005    conda-forge
     freetype                  2.10.4               h0708190_1    conda-forge
     fs                        2.4.11           py38h32f6830_2    conda-forge
     fsspec                    2021.8.1           pyhd8ed1ab_0    conda-forge
     fugue                     0.6.2                    pypi_0    pypi
     future                    0.18.2           py38h578d9bd_3    conda-forge
     gcc_impl_linux-64         11.2.0               h82a94d6_8    conda-forge
     gettext                   0.19.8.1          h73d1719_1006    conda-forge
     gflags                    2.2.2             he1b5a44_1004    conda-forge
     giflib                    5.2.1                h36c2ea0_2    conda-forge
     gitdb                     4.0.7              pyhd8ed1ab_0    conda-forge
     gitpython                 3.1.23             pyhd8ed1ab_1    conda-forge
     glog                      0.5.0                h48cff8f_0    conda-forge
     gmp                       6.2.1                h58526e2_0    conda-forge
     graphite2                 1.3.13            h58526e2_1001    conda-forge
     greenlet                  1.1.1            py38h709712a_0    conda-forge
     grpc-cpp                  1.40.0               h850795e_0    conda-forge
     gunicorn                  20.1.0           py38h578d9bd_0    conda-forge
     h11                       0.12.0             pyhd8ed1ab_0    conda-forge
     harfbuzz                  2.9.1                h83ec7ef_0    conda-forge
     heapdict                  1.0.1                      py_0    conda-forge
     huggingface_hub           0.0.17             pyhd8ed1ab_0    conda-forge
     hypothesis                6.21.5             pyhd8ed1ab_0    conda-forge
     icu                       68.1                 h58526e2_0    conda-forge
     identify                  2.2.14             pyhd8ed1ab_0    conda-forge
     idna                      3.1                pyhd3deb0d_0    conda-forge
     imagesize                 1.2.0                      py_0    conda-forge
     importlib-metadata        4.8.1            py38h578d9bd_0    conda-forge
     importlib_metadata        4.8.1                hd8ed1ab_0    conda-forge
     iniconfig                 1.1.1              pyh9f0ad1d_0    conda-forge
     intake                    0.6.3              pyhd8ed1ab_0    conda-forge
     intel-openmp              2021.3.0          h06a4308_3350    defaults
     ipykernel                 6.4.1            py38he5a9106_0    conda-forge
     ipython                   7.27.0           py38he5a9106_0    conda-forge
     ipython_genutils          0.2.0                      py_1    conda-forge
     isort                     5.7.0                    pypi_0    pypi
     itsdangerous              2.0.1              pyhd8ed1ab_0    conda-forge
     jbig                      2.1               h7f98852_2003    conda-forge
     jedi                      0.18.0           py38h578d9bd_2    conda-forge
     jinja2                    3.0.1              pyhd8ed1ab_0    conda-forge
     joblib                    1.0.1              pyhd8ed1ab_0    conda-forge
     jpeg                      9d                   h36c2ea0_0    conda-forge
     jpype1                    1.3.0            py38h1fd1430_0    conda-forge
     jsonschema                3.2.0            py38h32f6830_1    conda-forge
     jupyter_client            7.0.3              pyhd8ed1ab_0    conda-forge
     jupyter_core              4.8.1            py38h578d9bd_0    conda-forge
     jupyterlab_pygments       0.1.2              pyh9f0ad1d_0    conda-forge
     kernel-headers_linux-64   2.6.32              he073ed8_14    conda-forge
     krb5                      1.19.2               hcc1bbae_0    conda-forge
     lcms2                     2.12                 hddcbb42_0    conda-forge
     ld_impl_linux-64          2.36.1               hea4e1c9_2    conda-forge
     lerc                      2.2.1                h9c3ff4c_0    conda-forge
     libblas                   3.9.0            11_linux64_mkl    conda-forge
     libbrotlicommon           1.0.9                h7f98852_5    conda-forge
     libbrotlidec              1.0.9                h7f98852_5    conda-forge
     libbrotlienc              1.0.9                h7f98852_5    conda-forge
     libcblas                  3.9.0            11_linux64_mkl    conda-forge
     libclang-cpp11            11.0.0          default_ha5c780c_2    conda-forge
     libcurl                   7.79.0               h2574ce0_0    conda-forge
     libdeflate                1.7                  h7f98852_5    conda-forge
     libedit                   3.1.20191231         he28a2e2_2    conda-forge
     libev                     4.33                 h516909a_1    conda-forge
     libevent                  2.1.10               hcdb4288_3    conda-forge
     libffi                    3.4.2                h9c3ff4c_1    conda-forge
     libgcc-devel_linux-64     11.2.0               h0952999_8    conda-forge
     libgcc-ng                 11.2.0               h1d223b6_8    conda-forge
     libgfortran-ng            11.2.0               h69a702a_8    conda-forge
     libgfortran5              11.2.0               h5c6108e_8    conda-forge
     libglib                   2.68.4               h174f98d_1    conda-forge
     libgomp                   11.2.0               h1d223b6_8    conda-forge
     libiconv                  1.16                 h516909a_0    conda-forge
     liblapack                 3.9.0            11_linux64_mkl    conda-forge
     libllvm10                 10.0.1               he513fc3_3    conda-forge
     libllvm11                 11.0.1               hf817b99_0    conda-forge
     libnghttp2                1.43.0               h812cca2_0    conda-forge
     libntlm                   1.4               h7f98852_1002    conda-forge
     libpng                    1.6.37               h21135ba_2    conda-forge
     libpq                     13.3                 hd57d9b9_0    conda-forge
     libprotobuf               3.16.0               h780b84a_0    conda-forge
     librmm                    21.08.02        cuda11.2_g115bad2_0    rapidsai
     libsanitizer              11.2.0               he4da1e4_8    conda-forge
     libsodium                 1.0.18               h36c2ea0_1    conda-forge
     libssh2                   1.10.0               ha56f1ee_0    conda-forge
     libstdcxx-ng              11.2.0               he4da1e4_8    conda-forge
     libthrift                 0.15.0               he6d91bd_0    conda-forge
     libtiff                   4.3.0                hf544144_1    conda-forge
     libtool                   2.4.6             h9c3ff4c_1008    conda-forge
     libutf8proc               2.6.1                h7f98852_0    conda-forge
     libuuid                   2.32.1            h7f98852_1000    conda-forge
     libuv                     1.42.0               h7f98852_0    conda-forge
     libwebp-base              1.2.1                h7f98852_0    conda-forge
     libxcb                    1.13              h7f98852_1003    conda-forge
     libxgboost                1.4.2dev.rapidsai21.08      cuda11.2_0    rapidsai
     libxml2                   2.9.12               h72842e0_0    conda-forge
     lightgbm                  3.2.1            py38h709712a_0    conda-forge
     llvmlite                  0.36.0           py38h4630a5e_0    conda-forge
     locket                    0.2.0                      py_2    conda-forge
     lz4-c                     1.9.3                h9c3ff4c_1    conda-forge
     m4                        1.4.18            h516909a_1001    conda-forge
     make                      4.3                  hd18ef5c_1    conda-forge
     mako                      1.1.5              pyhd8ed1ab_0    conda-forge
     markdown                  3.3.4              pyhd8ed1ab_0    conda-forge
     markupsafe                2.0.1            py38h497a2fe_0    conda-forge
     matplotlib-inline         0.1.3              pyhd8ed1ab_0    conda-forge
     maven                     3.6.3                ha770c72_0    conda-forge
     mccabe                    0.6.1                      py_1    conda-forge
     mimesis                   4.0.0              pyh9f0ad1d_0    conda-forge
     mistune                   0.8.4           py38h497a2fe_1004    conda-forge
     mkl                       2021.3.0           h06a4308_520    defaults
     mlflow                    1.20.2           py38he918c71_0    conda-forge
     mock                      4.0.3            py38h578d9bd_1    conda-forge
     more-itertools            8.10.0             pyhd8ed1ab_0    conda-forge
     msgpack-python            1.0.2            py38h1fd1430_1    conda-forge
     multipledispatch          0.6.0                      py_0    conda-forge
     mypy                      0.782                      py_0    conda-forge
     mypy_extensions           0.4.3            py38h578d9bd_3    conda-forge
     nbclient                  0.5.4              pyhd8ed1ab_0    conda-forge
     nbconvert                 6.1.0            py38h578d9bd_1    conda-forge
     nbformat                  5.1.3              pyhd8ed1ab_0    conda-forge
     nbsphinx                  0.8.7              pyhd8ed1ab_0    conda-forge
     nccl                      2.10.3.1             hdc17891_0    conda-forge
     ncurses                   6.2                  h58526e2_4    conda-forge
     nest-asyncio              1.5.1              pyhd8ed1ab_0    conda-forge
     ninja                     1.10.2               h4bd325d_0    conda-forge
     nodeenv                   1.6.0              pyhd8ed1ab_0    conda-forge
     notebook                  6.4.4              pyha770c72_0    conda-forge
     numba                     0.53.1           py38h8b71fd7_1    conda-forge
     numpy                     1.21.2           py38he2449b9_0    conda-forge
     numpydoc                  1.1.0                      py_1    conda-forge
     nvtx                      0.2.3            py38h497a2fe_0    conda-forge
     olefile                   0.46               pyh9f0ad1d_1    conda-forge
     openjdk                   11.0.9.1             h5cc2fde_1    conda-forge
     openjpeg                  2.4.0                hb52868f_1    conda-forge
     openssl                   1.1.1l               h7f98852_0    conda-forge
     orc                       1.6.10               h58a87f1_0    conda-forge
     packaging                 21.0               pyhd8ed1ab_0    conda-forge
     pandas                    1.3.3            py38h43a58ef_0    conda-forge
     pandoc                    1.19.2                        0    conda-forge
     pandocfilters             1.5.0              pyhd8ed1ab_0    conda-forge
     parquet-cpp               1.5.1                         1    conda-forge
     parso                     0.8.2              pyhd8ed1ab_0    conda-forge
     partd                     1.2.0              pyhd8ed1ab_0    conda-forge
     pathspec                  0.9.0              pyhd8ed1ab_0    conda-forge
     pcre                      8.45                 h9c3ff4c_0    conda-forge
     perl                      5.32.1          0_h7f98852_perl5    conda-forge
     pexpect                   4.8.0            py38h32f6830_1    conda-forge
     pickleshare               0.7.5           py38h32f6830_1002    conda-forge
     pillow                    8.3.2            py38h8e6f84c_0    conda-forge
     pip                       20.2.4                     py_0    conda-forge
     pixman                    0.40.0               h36c2ea0_0    conda-forge
     pkg-config                0.29.2            h36c2ea0_1008    conda-forge
     pluggy                    1.0.0            py38h578d9bd_1    conda-forge
     pre-commit                2.15.0           py38h578d9bd_0    conda-forge
     pre_commit                2.15.0               hd8ed1ab_0    conda-forge
     prometheus_client         0.11.0             pyhd8ed1ab_0    conda-forge
     prometheus_flask_exporter 0.18.2             pyhd8ed1ab_0    conda-forge
     prompt-toolkit            3.0.20             pyha770c72_0    conda-forge
     prompt_toolkit            3.0.20               hd8ed1ab_0    conda-forge
     protobuf                  3.16.0           py38h709712a_0    conda-forge
     psutil                    5.8.0            py38h497a2fe_1    conda-forge
     psycopg2                  2.9.1            py38h497a2fe_0    conda-forge
     pthread-stubs             0.4               h36c2ea0_1001    conda-forge
     ptvsd                     4.3.2                    pypi_0    pypi
     ptyprocess                0.7.0              pyhd3deb0d_0    conda-forge
     py                        1.10.0             pyhd3deb0d_0    conda-forge
     py-cpuinfo                8.0.0              pyhd8ed1ab_0    conda-forge
     py-xgboost                1.4.2dev.rapidsai21.08  cuda11.2py38_0    rapidsai
     pyarrow                   5.0.0           py38hed47224_6_cuda    conda-forge
     pycodestyle               2.6.0              pyh9f0ad1d_0    conda-forge
     pycparser                 2.20               pyh9f0ad1d_2    conda-forge
     pydantic                  1.8.2            py38h497a2fe_0    conda-forge
     pydata-sphinx-theme       0.6.3              pyhd8ed1ab_0    conda-forge
     pyflakes                  2.2.0              pyh9f0ad1d_0    conda-forge
     pygments                  2.10.0             pyhd8ed1ab_0    conda-forge
     pyhive                    0.6.4              pyhd8ed1ab_0    conda-forge
     pynvml                    11.0.0             pyhd8ed1ab_0    conda-forge
     pyopenssl                 20.0.1             pyhd8ed1ab_0    conda-forge
     pyorc                     0.4.0                    pypi_0    pypi
     pyparsing                 2.4.7              pyh9f0ad1d_0    conda-forge
     pyrsistent                0.17.3           py38h497a2fe_2    conda-forge
     pysocks                   1.7.1            py38h578d9bd_3    conda-forge
     pytest                    6.2.5            py38h578d9bd_0    conda-forge
     pytest-benchmark          3.4.1              pyhd8ed1ab_0    conda-forge
     pytest-cov                2.12.1             pyhd8ed1ab_0    conda-forge
     pytest-forked             1.3.0              pyhd3deb0d_0    conda-forge
     pytest-xdist              2.3.0              pyhd8ed1ab_0    conda-forge
     python                    3.8.12          hb7a2778_0_cpython    conda-forge
     python-dateutil           2.8.2              pyhd8ed1ab_0    conda-forge
     python-editor             1.0.4                      py_0    conda-forge
     python_abi                3.8                      2_cp38    conda-forge
     pytorch                   1.9.0           cpu_py38h4bbe6ce_2    conda-forge
     pytz                      2021.1             pyhd8ed1ab_0    conda-forge
     pyyaml                    5.4.1            py38h497a2fe_1    conda-forge
     pyzmq                     22.3.0           py38h2035c66_0    conda-forge
     qpd                       0.2.5                    pypi_0    pypi
     querystring_parser        1.2.4                      py_0    conda-forge
     rapidjson                 1.1.0             he1b5a44_1002    conda-forge
     re2                       2021.09.01           h9c3ff4c_0    conda-forge
     readline                  8.1                  h46c0cb4_0    conda-forge
     recommonmark              0.7.1              pyhd8ed1ab_0    conda-forge
     regex                     2021.8.28        py38h497a2fe_0    conda-forge
     requests                  2.26.0             pyhd8ed1ab_0    conda-forge
     rhash                     1.4.1                h7f98852_0    conda-forge
     rmm                       21.08.02        cuda_11.2_py38_g115bad2_0    rapidsai
     s2n                       1.0.10               h9b69904_0    conda-forge
     sacremoses                0.0.43             pyh9f0ad1d_0    conda-forge
     sasl                      0.3.1            py38h709712a_0    conda-forge
     scikit-learn              0.24.2           py38hacb3eff_1    conda-forge
     scipy                     1.7.1            py38h56a6a73_0    conda-forge
     send2trash                1.8.0              pyhd8ed1ab_0    conda-forge
     setuptools                58.0.4           py38h578d9bd_0    conda-forge
     six                       1.16.0             pyh6c4a22f_0    conda-forge
     sleef                     3.5.1                h7f98852_1    conda-forge
     smmap                     3.0.5              pyh44b312d_0    conda-forge
     snappy                    1.1.8                he1b5a44_3    conda-forge
     snowballstemmer           2.1.0              pyhd8ed1ab_0    conda-forge
     sortedcontainers          2.4.0              pyhd8ed1ab_0    conda-forge
     soupsieve                 2.0.1            py38h32f6830_0    conda-forge
     spdlog                    1.8.5                h4bd325d_0    conda-forge
     sphinx                    4.2.0              pyh6c4a22f_0    conda-forge
     sphinx-copybutton         0.4.0              pyhd8ed1ab_0    conda-forge
     sphinx-markdown-tables    0.0.15             pyhd3deb0d_0    conda-forge
     sphinxcontrib-applehelp   1.0.2                      py_0    conda-forge
     sphinxcontrib-devhelp     1.0.2                      py_0    conda-forge
     sphinxcontrib-htmlhelp    2.0.0              pyhd8ed1ab_0    conda-forge
     sphinxcontrib-jsmath      1.0.1                      py_0    conda-forge
     sphinxcontrib-qthelp      1.0.3                      py_0    conda-forge
     sphinxcontrib-serializinghtml 1.1.5              pyhd8ed1ab_0    conda-forge
     sphinxcontrib-websupport  1.2.4              pyh9f0ad1d_0    conda-forge
     sqlalchemy                1.4.23           py38h497a2fe_0    conda-forge
     sqlite                    3.36.0               h9cd32fc_1    conda-forge
     sqlparse                  0.4.2              pyhd8ed1ab_0    conda-forge
     starlette                 0.14.2             pyhd8ed1ab_0    conda-forge
     stopit                    1.1.2                      py_0    conda-forge
     streamz                   0.6.2              pyh44b312d_0    conda-forge
     sysroot_linux-64          2.12                he073ed8_14    conda-forge
     tabulate                  0.8.9              pyhd8ed1ab_0    conda-forge
     tblib                     1.7.0              pyhd8ed1ab_0    conda-forge
     tenacity                  8.0.1              pyhd8ed1ab_0    conda-forge
     terminado                 0.12.1           py38h578d9bd_0    conda-forge
     testpath                  0.5.0              pyhd8ed1ab_0    conda-forge
     threadpoolctl             2.2.0              pyh8a188c0_0    conda-forge
     thrift                    0.13.0           py38h709712a_2    conda-forge
     thrift_sasl               0.4.3            py38h497a2fe_0    conda-forge
     tk                        8.6.11               h27826a3_1    conda-forge
     tokenizers                0.10.1           py38hb63a372_0    conda-forge
     toml                      0.10.2             pyhd8ed1ab_0    conda-forge
     toolz                     0.11.1                     py_0    conda-forge
     tornado                   6.1              py38h497a2fe_1    conda-forge
     tpot                      0.11.7             pyhd8ed1ab_1    conda-forge
     tqdm                      4.62.2             pyhd8ed1ab_0    conda-forge
     traitlets                 5.1.0              pyhd8ed1ab_0    conda-forge
     transformers              4.10.2             pyhd8ed1ab_0    conda-forge
     triad                     0.5.4              pyhd8ed1ab_0    conda-forge
     typed-ast                 1.4.3            py38h497a2fe_0    conda-forge
     typing-extensions         3.10.0.0             hd8ed1ab_0    conda-forge
     typing_extensions         3.10.0.0           pyha770c72_0    conda-forge
     tzdata                    2021a                he74cb21_1    conda-forge
     tzlocal                   3.0              py38h578d9bd_2    conda-forge
     update_checker            0.18.0             pyh9f0ad1d_0    conda-forge
     urllib3                   1.26.6             pyhd8ed1ab_0    conda-forge
     uvicorn                   0.15.0           py38h578d9bd_1    conda-forge
     virtualenv                20.4.7           py38h578d9bd_0    conda-forge
     wcwidth                   0.2.5              pyh9f0ad1d_2    conda-forge
     webencodings              0.5.1                      py_1    conda-forge
     websocket-client          0.57.0           py38h578d9bd_4    conda-forge
     werkzeug                  2.0.1              pyhd8ed1ab_0    conda-forge
     wheel                     0.37.0             pyhd8ed1ab_1    conda-forge
     xorg-fixesproto           5.0               h7f98852_1002    conda-forge
     xorg-inputproto           2.3.2             h7f98852_1002    conda-forge
     xorg-kbproto              1.0.7             h7f98852_1002    conda-forge
     xorg-libice               1.0.10               h7f98852_0    conda-forge
     xorg-libsm                1.2.3             hd9c2040_1000    conda-forge
     xorg-libx11               1.7.2                h7f98852_0    conda-forge
     xorg-libxau               1.0.9                h7f98852_0    conda-forge
     xorg-libxdmcp             1.1.3                h7f98852_0    conda-forge
     xorg-libxext              1.3.4                h7f98852_1    conda-forge
     xorg-libxfixes            5.0.3             h7f98852_1004    conda-forge
     xorg-libxi                1.7.10               h7f98852_0    conda-forge
     xorg-libxrender           0.9.10            h7f98852_1003    conda-forge
     xorg-libxtst              1.2.3             h7f98852_1002    conda-forge
     xorg-recordproto          1.14.2            h7f98852_1002    conda-forge
     xorg-renderproto          0.11.1            h7f98852_1002    conda-forge
     xorg-xextproto            7.3.0             h7f98852_1002    conda-forge
     xorg-xproto               7.0.31            h7f98852_1007    conda-forge
     xz                        5.2.5                h516909a_1    conda-forge
     yaml                      0.2.5                h516909a_0    conda-forge
     zeromq                    4.3.4                h9c3ff4c_1    conda-forge
     zict                      2.0.0                      py_0    conda-forge
     zipp                      3.5.0              pyhd8ed1ab_0    conda-forge
     zlib                      1.2.11            h516909a_1010    conda-forge
     zstd                      1.5.0                ha95c52a_0    conda-forge

Additional context This bug should've been caught in this test. However, for some reason dask.dataframe.assert_eq doesn't raise an error for these differently sorted dataframes:

import cudf
import dask_cudf
import dask.dataframe as dd

gdf = cudf.DataFrame(
    {
        'a': list(range(15)) + [None] * 5, 
        'b': list(reversed(range(20)))
    }
)
gddf = dask_cudf.from_cudf(gdf, npartitions=5)

got = gddf.sort_values(by=["a", "b"]).compute()
expect = gdf.sort_values(by=["a", "b"])

dd.assert_eq(got, expect)
True
beckernick commented 2 years ago

Oof, good find. Global order isn't preserved here.

I suspect this might be the problem in the testing utility: https://github.com/dask/dask/blob/9fc5777f3d83f1084360adf982da301ed4afe13b/dask/dataframe/utils.py#L553-L554

EDIT:

        a = _maybe_sort(a)
        b = _maybe_sort(b)
        tm.assert_frame_equal(a, b, check_dtype=check_dtype, **kwargs)

Once we're in pandas-land, sorting will return the global index order for both dataframes.

charlesbluca commented 2 years ago

Thanks for the quick find @beckernick! Looks like we can get around this behavior by setting check_index=False, but I'd imagine for cases where we want to compare the sorting and the index, it would be nice to have a kwarg like check_order that can be used to enable/disable the calls to _maybe_sort altogether.

For now, I think the best short-term option would be to replace any instances of

dd.assert_eq(got, expect)

With something like

from cudf.testing._utils import assert_eq

assert_eq(got.compute(), expect)