rapidsai / cudf

cuDF - GPU DataFrame Library
https://docs.rapids.ai/api/cudf/stable/
Apache License 2.0
8.47k stars 907 forks source link

[BUG] Off-by-1 error in `__floordiv__` binary op #17073

Open galipremsagar opened 1 month ago

galipremsagar commented 1 month ago

Describe the bug When a floordiv is performed in libcudf, it seems that there is an off-by-1 error.

Steps/Code to reproduce bug pickle files: Archive.zip

In [1]: import pandas as pd

In [2]: psr1 = pd.read_pickle("psr1")

In [3]: psr2 = pd.read_pickle("psr2")

In [4]: psr1 // psr2
Out[4]: 
0      100.0
1       99.0
2       99.0
3       99.0
4      100.0
       ...  
995     99.0
996     99.0
997    100.0
998    100.0
999     99.0
Length: 1000, dtype: float64

In [5]: import cudf
g 
In [6]: gsr1 = cudf.from_pandas(psr1)
g
In [7]: gsr2 = cudf.from_pandas(psr2)

In [8]: gsr1 // gsr2
Out[8]: 
0      100.0
1       99.0
2      100.0
3      100.0
4      100.0
       ...  
995     99.0
996    100.0
997    100.0
998    100.0
999    100.0
Length: 1000, dtype: float64

Expected behavior Match pandas

Environment overview (please complete the following information)

Environment details Please run and paste the output of the cudf/print_env.sh script here, to gather any other relevant environment details

Click here to see environment details

     **git***
     commit e100e2d4c27d221b0163b26edd3b0295f88af7d7 (HEAD -> numpy_random, origin/numpy_random)
     Author: galipremsagar 
     Date:   Fri Oct 11 22:38:17 2024 +0000

     fix issues
     **git submodules***

     ***OS Information***
     DISTRIB_ID=Ubuntu
     DISTRIB_RELEASE=22.04
     DISTRIB_CODENAME=jammy
     DISTRIB_DESCRIPTION="Ubuntu 22.04.2 LTS"
     PRETTY_NAME="Ubuntu 22.04.2 LTS"
     NAME="Ubuntu"
     VERSION_ID="22.04"
     VERSION="22.04.2 LTS (Jammy Jellyfish)"
     VERSION_CODENAME=jammy
     ID=ubuntu
     ID_LIKE=debian
     HOME_URL="https://www.ubuntu.com/"
     SUPPORT_URL="https://help.ubuntu.com/"
     BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
     PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
     UBUNTU_CODENAME=jammy
     Linux dt07 5.15.0-107-generic #117-Ubuntu SMP Fri Apr 26 12:26:49 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

     ***GPU Information***
     Sat Oct 12 12:36:46 2024
     +---------------------------------------------------------------------------------------+
     | NVIDIA-SMI 535.161.08             Driver Version: 535.161.08   CUDA Version: 12.2     |
     |-----------------------------------------+----------------------+----------------------+
     | GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
     | Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
     |                                         |                      |               MIG M. |
     |=========================================+======================+======================|
     |   0  Tesla T4                       On  | 00000000:3B:00.0 Off |                    0 |
     | N/A   48C    P0              27W /  70W |   8104MiB / 15360MiB |      0%      Default |
     |                                         |                      |                  N/A |
     +-----------------------------------------+----------------------+----------------------+
     |   1  Tesla T4                       On  | 00000000:5E:00.0 Off |                    0 |
     | N/A   55C    P0              29W /  70W |   2114MiB / 15360MiB |      0%      Default |
     |                                         |                      |                  N/A |
     +-----------------------------------------+----------------------+----------------------+
     |   2  Tesla T4                       On  | 00000000:AF:00.0 Off |                    0 |
     | N/A   49C    P0              28W /  70W |   2114MiB / 15360MiB |      0%      Default |
     |                                         |                      |                  N/A |
     +-----------------------------------------+----------------------+----------------------+
     |   3  Tesla T4                       On  | 00000000:D8:00.0 Off |                    0 |
     | N/A   46C    P0              28W /  70W |   2114MiB / 15360MiB |      0%      Default |
     |                                         |                      |                  N/A |
     +-----------------------------------------+----------------------+----------------------+

     +---------------------------------------------------------------------------------------+
     | Processes:                                                                            |
     |  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
     |        ID   ID                                                             Usage      |
     |=======================================================================================|
     +---------------------------------------------------------------------------------------+

     ***CPU***
     Architecture:                       x86_64
     CPU op-mode(s):                     32-bit, 64-bit
     Address sizes:                      46 bits physical, 48 bits virtual
     Byte Order:                         Little Endian
     CPU(s):                             64
     On-line CPU(s) list:                0-63
     Vendor ID:                          GenuineIntel
     Model name:                         Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz
     CPU family:                         6
     Model:                              85
     Thread(s) per core:                 2
     Core(s) per socket:                 16
     Socket(s):                          2
     Stepping:                           4
     CPU max MHz:                        3700.0000
     CPU min MHz:                        1000.0000
     BogoMIPS:                           4200.00
     Flags:                              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti intel_ppin ssbd mba ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts pku ospke md_clear flush_l1d arch_capabilities
     Virtualization:                     VT-x
     L1d cache:                          1 MiB (32 instances)
     L1i cache:                          1 MiB (32 instances)
     L2 cache:                           32 MiB (32 instances)
     L3 cache:                           44 MiB (2 instances)
     NUMA node(s):                       2
     NUMA node0 CPU(s):                  0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62
     NUMA node1 CPU(s):                  1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47,49,51,53,55,57,59,61,63
     Vulnerability Gather data sampling: Mitigation; Microcode
     Vulnerability Itlb multihit:        KVM: Mitigation: VMX disabled
     Vulnerability L1tf:                 Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable
     Vulnerability Mds:                  Mitigation; Clear CPU buffers; SMT vulnerable
     Vulnerability Meltdown:             Mitigation; PTI
     Vulnerability Mmio stale data:      Mitigation; Clear CPU buffers; SMT vulnerable
     Vulnerability Retbleed:             Mitigation; IBRS
     Vulnerability Spec rstack overflow: Not affected
     Vulnerability Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl and seccomp
     Vulnerability Spectre v1:           Mitigation; usercopy/swapgs barriers and __user pointer sanitization
     Vulnerability Spectre v2:           Mitigation; IBRS; IBPB conditional; STIBP conditional; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected
     Vulnerability Srbds:                Not affected
     Vulnerability Tsx async abort:      Mitigation; Clear CPU buffers; SMT vulnerable

     ***CMake***
     /nvme/0/pgali/envs/cudfdev/bin/cmake
     cmake version 3.30.5

     CMake suite maintained and supported by Kitware (kitware.com/cmake).

     ***g++***
     /nvme/0/pgali/envs/cudfdev/bin/g++
     g++ (conda-forge gcc 11.4.0-13) 11.4.0
     Copyright (C) 2021 Free Software Foundation, Inc.
     This is free software; see the source for copying conditions.  There is NO
     warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

     ***nvcc***
     /nvme/0/pgali/envs/cudfdev/bin/nvcc
     nvcc: NVIDIA (R) Cuda compiler driver
     Copyright (c) 2005-2024 NVIDIA Corporation
     Built on Thu_Jun__6_02:18:23_PDT_2024
     Cuda compilation tools, release 12.5, V12.5.82
     Build cuda_12.5.r12.5/compiler.34385749_0

     ***Python***
     /nvme/0/pgali/envs/cudfdev/bin/python
     Python 3.12.7

     ***Environment Variables***
     PATH                            : /nvme/0/pgali/envs/cudfdev/bin:/nvme/0/pgali/envs/cudfdev/bin:/nvme/0/pgali/miniforge3/condabin:/nvme/0/pgali/miniforge3/bin:/raid/pgali/miniforge3/bin:/nvme/0/pgali/.vscode-server/cli/servers/Stable-384ff7382de624fb94dbaf6da11977bba1ecd427/server/bin/remote-cli:/raid/pgali/miniforge3/bin:/nvme/0/pgali/.cargo/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/usr/local/cuda/bin
     LD_LIBRARY_PATH                 : /usr/local/cuda/lib64::/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64
     NUMBAPRO_NVVM                   :
     NUMBAPRO_LIBDEVICE              :
     CONDA_PREFIX                    : /nvme/0/pgali/envs/cudfdev
     PYTHON_PATH                     :

     ***conda packages***
     /nvme/0/pgali/miniforge3/condabin/conda
     # packages in environment at /nvme/0/pgali/envs/cudfdev:
     #
     # Name                    Version                   Build  Channel
     _libgcc_mutex             0.1                 conda_forge    conda-forge
     _openmp_mutex             4.5                  2_kmp_llvm    conda-forge
     accessible-pygments       0.0.5              pyhd8ed1ab_0    conda-forge
     aiobotocore               2.15.1             pyhd8ed1ab_0    conda-forge
     aiohappyeyeballs          2.4.3              pyhd8ed1ab_0    conda-forge
     aiohttp                   3.10.10         py312h178313f_0    conda-forge
     aioitertools              0.12.0             pyhd8ed1ab_0    conda-forge
     aiosignal                 1.3.1              pyhd8ed1ab_0    conda-forge
     alabaster                 0.7.16             pyhd8ed1ab_0    conda-forge
     anyio                     4.6.0              pyhd8ed1ab_1    conda-forge
     argon2-cffi               23.1.0             pyhd8ed1ab_0    conda-forge
     argon2-cffi-bindings      21.2.0          py312h66e93f0_5    conda-forge
     arrow                     1.3.0              pyhd8ed1ab_0    conda-forge
     asttokens                 2.4.1              pyhd8ed1ab_0    conda-forge
     async-lru                 2.0.4              pyhd8ed1ab_0    conda-forge
     attrs                     24.2.0             pyh71513ae_0    conda-forge
     aws-c-auth                0.7.31               h57bd9a3_0    conda-forge
     aws-c-cal                 0.7.4                hfd43aa1_1    conda-forge
     aws-c-common              0.9.28               hb9d3cd8_0    conda-forge
     aws-c-compression         0.2.19               h756ea98_1    conda-forge
     aws-c-event-stream        0.4.3                h29ce20c_2    conda-forge
     aws-c-http                0.8.10               h5e77a74_0    conda-forge
     aws-c-io                  0.14.18             h2af50b2_12    conda-forge
     aws-c-mqtt                0.10.7               h02abb05_0    conda-forge
     aws-c-s3                  0.6.6                h834ce55_0    conda-forge
     aws-c-sdkutils            0.1.19               h756ea98_3    conda-forge
     aws-checksums             0.1.20               h756ea98_0    conda-forge
     aws-crt-cpp               0.28.3               h3e6eb3e_6    conda-forge
     aws-sdk-cpp               1.11.407             h9f1560d_0    conda-forge
     aws-xray-sdk              2.14.0             pyhd8ed1ab_0    conda-forge
     azure-core-cpp            1.13.0               h935415a_0    conda-forge
     azure-identity-cpp        1.8.0                hd126650_2    conda-forge
     azure-storage-blobs-cpp   12.12.0              hd2e3451_0    conda-forge
     azure-storage-common-cpp  12.7.0               h10ac4d7_1    conda-forge
     azure-storage-files-datalake-cpp 12.11.0              h325d260_1    conda-forge
     babel                     2.14.0             pyhd8ed1ab_0    conda-forge
     backports.zoneinfo        0.2.1           py312h7900ff3_9    conda-forge
     beautifulsoup4            4.12.3             pyha770c72_0    conda-forge
     binutils                  2.40                 h4852527_7    conda-forge
     binutils_impl_linux-64    2.40                 ha1999f0_7    conda-forge
     binutils_linux-64         2.40                 hb3c18ed_4    conda-forge
     bleach                    6.1.0              pyhd8ed1ab_0    conda-forge
     blinker                   1.8.2              pyhd8ed1ab_0    conda-forge
     bokeh                     3.5.2              pyhd8ed1ab_0    conda-forge
     boto3                     1.35.23            pyhd8ed1ab_0    conda-forge
     botocore                  1.35.23         pyge310_1234567_0    conda-forge
     breathe                   4.35.0             pyhd8ed1ab_2    conda-forge
     brotli-python             1.1.0           py312h2ec8cdc_2    conda-forge
     bzip2                     1.0.8                h4bc722e_7    conda-forge
     c-ares                    1.34.1               heb4867d_0    conda-forge
     c-compiler                1.5.2                h0b41bf4_0    conda-forge
     ca-certificates           2024.8.30            hbcca054_0    conda-forge
     cached-property           1.5.2                hd8ed1ab_1    conda-forge
     cached_property           1.5.2              pyha770c72_1    conda-forge
     cachetools                5.5.0              pyhd8ed1ab_0    conda-forge
     certifi                   2024.8.30          pyhd8ed1ab_0    conda-forge
     cffi                      1.17.1          py312h06ac9bb_0    conda-forge
     cfgv                      3.3.1              pyhd8ed1ab_0    conda-forge
     charset-normalizer        3.4.0              pyhd8ed1ab_0    conda-forge
     clang                     16.0.6          default_h9e3a008_13    conda-forge
     clang-16                  16.0.6          default_hb5137d0_13    conda-forge
     clang-format              16.0.6          default_hb5137d0_13    conda-forge
     clang-format-16           16.0.6          default_hb5137d0_13    conda-forge
     clang-tools               16.0.6          default_hb5137d0_13    conda-forge
     click                     8.1.7           unix_pyh707e725_0    conda-forge
     cloudpickle               3.0.0              pyhd8ed1ab_0    conda-forge
     cmake                     3.30.5               hf9cb763_0    conda-forge
     colorama                  0.4.6              pyhd8ed1ab_0    conda-forge
     comm                      0.2.2              pyhd8ed1ab_0    conda-forge
     commonmark                0.9.1                      py_0    conda-forge
     contourpy                 1.3.0           py312h68727a3_2    conda-forge
     coverage                  7.6.2           py312h66e93f0_0    conda-forge
     cpython                   3.12.7          py312hd8ed1ab_0    conda-forge
     cramjam                   2.8.4rc3        py312h78ba408_2    conda-forge
     cryptography              43.0.1          py312hda17c39_0    conda-forge
     cuda-cccl_linux-64        12.5.39              ha770c72_0    conda-forge
     cuda-crt-dev_linux-64     12.5.82              ha770c72_0    conda-forge
     cuda-crt-tools            12.5.82              ha770c72_0    conda-forge
     cuda-cudart               12.5.82              he02047a_0    conda-forge
     cuda-cudart-dev           12.5.82              he02047a_0    conda-forge
     cuda-cudart-dev_linux-64  12.5.82              h85509e4_0    conda-forge
     cuda-cudart-static        12.5.82              he02047a_0    conda-forge
     cuda-cudart-static_linux-64 12.5.82              h85509e4_0    conda-forge
     cuda-cudart_linux-64      12.5.82              h85509e4_0    conda-forge
     cuda-driver-dev_linux-64  12.5.82              h85509e4_0    conda-forge
     cuda-nvcc                 12.5.82              hcdd1206_0    conda-forge
     cuda-nvcc-dev_linux-64    12.5.82              ha770c72_0    conda-forge
     cuda-nvcc-impl            12.5.82              hd3aeb46_0    conda-forge
     cuda-nvcc-tools           12.5.82              hd3aeb46_0    conda-forge
     cuda-nvcc_linux-64        12.5.82              h8a487aa_0    conda-forge
     cuda-nvrtc                12.5.82              he02047a_0    conda-forge
     cuda-nvrtc-dev            12.5.82              he02047a_0    conda-forge
     cuda-nvtx                 12.5.82              he02047a_0    conda-forge
     cuda-nvtx-dev             12.5.82              ha770c72_0    conda-forge
     cuda-nvvm-dev_linux-64    12.5.82              ha770c72_0    conda-forge
     cuda-nvvm-impl            12.5.82              h59595ed_0    conda-forge
     cuda-nvvm-tools           12.5.82              h59595ed_0    conda-forge
     cuda-python               12.6.0          py312he9d8a76_0    conda-forge
     cuda-sanitizer-api        12.5.81              he02047a_1    conda-forge
     cuda-version              12.5                 hd4f0392_3    conda-forge
     cudf                      24.12.0                  pypi_0    pypi
     cudf-polars               24.12.0                  pypi_0    pypi
     cupy                      13.3.0          py312h7d319b9_1    conda-forge
     cupy-core                 13.3.0          py312h28031eb_1    conda-forge
     cxx-compiler              1.5.2                hf52228f_0    conda-forge
     cyrus-sasl                2.1.27               h54b06d7_7    conda-forge
     cython                    3.0.11          py312h8fd2918_3    conda-forge
     cytoolz                   1.0.0           py312h66e93f0_1    conda-forge
     dask                      2024.9.0           pyhd8ed1ab_0    conda-forge
     dask-core                 2024.9.0           pyhd8ed1ab_0    conda-forge
     dask-cuda                 24.12.00a6      py312_241011_gf775d88_6    rapidsai-nightly
     dask-cudf                 24.12.0                  pypi_0    pypi
     dask-expr                 1.1.14             pyhd8ed1ab_0    conda-forge
     datasets                  2.14.4             pyhd8ed1ab_0    conda-forge
     debugpy                   1.8.7           py312h2ec8cdc_0    conda-forge
     decopatch                 1.4.10             pyhd8ed1ab_0    conda-forge
     decorator                 5.1.1              pyhd8ed1ab_0    conda-forge
     defusedxml                0.7.1              pyhd8ed1ab_0    conda-forge
     dill                      0.3.7              pyhd8ed1ab_0    conda-forge
     distlib                   0.3.9              pyhd8ed1ab_0    conda-forge
     distributed               2024.9.0           pyhd8ed1ab_0    conda-forge
     dlpack                    0.8                  h59595ed_3    conda-forge
     docutils                  0.20.1          py312h7900ff3_3    conda-forge
     doxygen                   1.9.1                hb166930_1    conda-forge
     entrypoints               0.4                pyhd8ed1ab_0    conda-forge
     et_xmlfile                1.1.0              pyhd8ed1ab_0    conda-forge
     exceptiongroup            1.2.2              pyhd8ed1ab_0    conda-forge
     execnet                   2.1.1              pyhd8ed1ab_0    conda-forge
     executing                 2.1.0              pyhd8ed1ab_0    conda-forge
     fastavro                  1.9.7           py312h66e93f0_0    conda-forge
     fastrlock                 0.8.2           py312h30efb56_2    conda-forge
     filelock                  3.16.1             pyhd8ed1ab_0    conda-forge
     flask                     3.0.3              pyhd8ed1ab_0    conda-forge
     flask-cors                5.0.0              pyhd8ed1ab_0    conda-forge
     flatbuffers               24.3.25              h59595ed_0    conda-forge
     fmt                       11.0.2               h434a139_0    conda-forge
     fqdn                      1.5.1              pyhd8ed1ab_0    conda-forge
     freetype                  2.12.1               h267a509_2    conda-forge
     frozenlist                1.4.1           py312h66e93f0_1    conda-forge
     fsspec                    2024.9.0           pyhff2d567_0    conda-forge
     future                    1.0.0              pyhd8ed1ab_0    conda-forge
     gcc                       11.4.0              h602e360_13    conda-forge
     gcc_impl_linux-64         11.4.0              h00c12a0_13    conda-forge
     gcc_linux-64              11.4.0               ha077dfb_4    conda-forge
     gflags                    2.2.2             h5888daf_1005    conda-forge
     glog                      0.7.1                hbabe93e_0    conda-forge
     gmp                       6.3.0                hac33072_2    conda-forge
     gmpy2                     2.1.5           py312h7201bc8_2    conda-forge
     greenlet                  3.1.1           py312h2ec8cdc_0    conda-forge
     gxx                       11.4.0              h602e360_13    conda-forge
     gxx_impl_linux-64         11.4.0              h634f3ee_13    conda-forge
     gxx_linux-64              11.4.0               h35bfe5d_4    conda-forge
     h11                       0.14.0             pyhd8ed1ab_0    conda-forge
     h2                        4.1.0              pyhd8ed1ab_0    conda-forge
     hpack                     4.0.0              pyh9f0ad1d_0    conda-forge
     httpcore                  1.0.6              pyhd8ed1ab_0    conda-forge
     httpx                     0.27.2             pyhd8ed1ab_0    conda-forge
     huggingface_hub           0.25.2             pyh0610db2_0    conda-forge
     hyperframe                6.0.1              pyhd8ed1ab_0    conda-forge
     hypothesis                6.114.1            pyha770c72_0    conda-forge
     icu                       75.1                 he02047a_0    conda-forge
     identify                  2.6.1              pyhd8ed1ab_0    conda-forge
     idna                      3.10               pyhd8ed1ab_0    conda-forge
     imagesize                 1.4.1              pyhd8ed1ab_0    conda-forge
     importlib-metadata        8.5.0              pyha770c72_0    conda-forge
     importlib-resources       6.4.5              pyhd8ed1ab_0    conda-forge
     importlib_metadata        8.5.0                hd8ed1ab_0    conda-forge
     importlib_resources       6.4.5              pyhd8ed1ab_0    conda-forge
     iniconfig                 2.0.0              pyhd8ed1ab_0    conda-forge
     ipykernel                 6.29.5             pyh3099207_0    conda-forge
     ipython                   8.28.0             pyh707e725_0    conda-forge
     isoduration               20.11.0            pyhd8ed1ab_0    conda-forge
     itsdangerous              2.2.0              pyhd8ed1ab_0    conda-forge
     jedi                      0.19.1             pyhd8ed1ab_0    conda-forge
     jinja2                    3.1.4              pyhd8ed1ab_0    conda-forge
     jmespath                  1.0.1              pyhd8ed1ab_0    conda-forge
     joserfc                   1.0.0              pyhd8ed1ab_0    conda-forge
     json5                     0.9.25             pyhd8ed1ab_0    conda-forge
     jsondiff                  2.0.0              pyhd8ed1ab_0    conda-forge
     jsonpointer               3.0.0           py312h7900ff3_1    conda-forge
     jsonschema                4.23.0             pyhd8ed1ab_0    conda-forge
     jsonschema-path           0.3.3              pyhd8ed1ab_0    conda-forge
     jsonschema-specifications 2023.12.1          pyhd8ed1ab_0    conda-forge
     jsonschema-with-format-nongpl 4.23.0               hd8ed1ab_0    conda-forge
     jupyter-cache             1.0.0              pyhd8ed1ab_0    conda-forge
     jupyter-lsp               2.2.5              pyhd8ed1ab_0    conda-forge
     jupyter_client            8.6.3              pyhd8ed1ab_0    conda-forge
     jupyter_core              5.7.2              pyh31011fe_1    conda-forge
     jupyter_events            0.10.0             pyhd8ed1ab_0    conda-forge
     jupyter_server            2.14.2             pyhd8ed1ab_0    conda-forge
     jupyter_server_terminals  0.5.3              pyhd8ed1ab_0    conda-forge
     jupyterlab                4.2.5              pyhd8ed1ab_0    conda-forge
     jupyterlab_pygments       0.3.0              pyhd8ed1ab_1    conda-forge
     jupyterlab_server         2.27.3             pyhd8ed1ab_0    conda-forge
     kernel-headers_linux-64   3.10.0              he073ed8_17    conda-forge
     keyutils                  1.6.1                h166bdaf_0    conda-forge
     krb5                      1.21.3               h659f571_0    conda-forge
     lazy-object-proxy         1.10.0          py312h98912ed_0    conda-forge
     lcms2                     2.16                 hb7c19ff_0    conda-forge
     ld_impl_linux-64          2.40                 hf3520f5_7    conda-forge
     lerc                      4.0.0                h27087fc_0    conda-forge
     libabseil                 20240116.2      cxx17_he02047a_1    conda-forge
     libarrow                  17.0.0          had3b6fe_16_cpu    conda-forge
     libarrow-acero            17.0.0          h5888daf_16_cpu    conda-forge
     libarrow-dataset          17.0.0          h5888daf_16_cpu    conda-forge
     libarrow-substrait        17.0.0          hf54134d_16_cpu    conda-forge
     libblas                   3.9.0           24_linux64_openblas    conda-forge
     libbrotlicommon           1.1.0                hb9d3cd8_2    conda-forge
     libbrotlidec              1.1.0                hb9d3cd8_2    conda-forge
     libbrotlienc              1.1.0                hb9d3cd8_2    conda-forge
     libcblas                  3.9.0           24_linux64_openblas    conda-forge
     libclang-cpp16            16.0.6          default_hb5137d0_13    conda-forge
     libclang13                19.1.1          default_h9c6a7e4_0    conda-forge
     libcrc32c                 1.1.2                h9c3ff4c_0    conda-forge
     libcublas                 12.5.3.2             he02047a_0    conda-forge
     libcufft                  11.2.3.61            he02047a_0    conda-forge
     libcufile                 1.10.1.7             he02047a_0    conda-forge
     libcufile-dev             1.10.1.7             he02047a_0    conda-forge
     libcurand                 10.3.6.82            he02047a_0    conda-forge
     libcurand-dev             10.3.6.82            he02047a_0    conda-forge
     libcurl                   8.10.1               hbbe4b11_0    conda-forge
     libcusolver               11.6.3.83            he02047a_0    conda-forge
     libcusparse               12.5.1.3             he02047a_0    conda-forge
     libdeflate                1.22                 hb9d3cd8_0    conda-forge
     libedit                   3.1.20191231         he28a2e2_2    conda-forge
     libev                     4.33                 hd590300_2    conda-forge
     libevent                  2.1.12               hf998b51_1    conda-forge
     libexpat                  2.6.3                h5888daf_0    conda-forge
     libffi                    3.4.2                h7f98852_5    conda-forge
     libgcc                    14.1.0               h77fa898_1    conda-forge
     libgcc-devel_linux-64     11.4.0             h8f596e0_113    conda-forge
     libgcc-ng                 14.1.0               h69a702a_1    conda-forge
     libgfortran               14.1.0               h69a702a_1    conda-forge
     libgfortran-ng            14.1.0               h69a702a_1    conda-forge
     libgfortran5              14.1.0               hc5f4f2c_1    conda-forge
     libgomp                   14.1.0               h77fa898_1    conda-forge
     libgoogle-cloud           2.29.0               h435de7b_0    conda-forge
     libgoogle-cloud-storage   2.29.0               h0121fbd_0    conda-forge
     libgrpc                   1.62.2               h15f2491_0    conda-forge
     libhwloc                  2.11.1          default_hecaa2ac_1000    conda-forge
     libiconv                  1.17                 hd590300_2    conda-forge
     libjpeg-turbo             3.0.0                hd590300_1    conda-forge
     libkvikio                 24.12.00a       cuda12_241011_g22668fa_25    rapidsai-nightly
     liblapack                 3.9.0           24_linux64_openblas    conda-forge
     libllvm14                 14.0.6               hcd5def8_4    conda-forge
     libllvm16                 16.0.6               hb3ce162_3    conda-forge
     libllvm19                 19.1.1               ha7bfdaf_0    conda-forge
     libnghttp2                1.58.0               h47da74e_1    conda-forge
     libnsl                    2.0.1                hd590300_0    conda-forge
     libntlm                   1.4               h7f98852_1002    conda-forge
     libnvjitlink              12.5.82              he02047a_0    conda-forge
     libopenblas               0.3.27          pthreads_hac2b453_1    conda-forge
     libparquet                17.0.0          h39682fd_16_cpu    conda-forge
     libpng                    1.6.44               hadc24fc_0    conda-forge
     libprotobuf               4.25.3               hd5b35b9_1    conda-forge
     librdkafka                2.5.3                h95ba008_0    conda-forge
     libre2-11                 2023.09.01           h5a48ba9_2    conda-forge
     librmm                    24.12.00a15     cuda12_241011_g90a5631e_15    rapidsai-nightly
     libsanitizer              11.4.0              h5763a12_13    conda-forge
     libsodium                 1.0.20               h4ab18f5_0    conda-forge
     libsqlite                 3.46.1               hadc24fc_0    conda-forge
     libssh2                   1.11.0               h0841786_0    conda-forge
     libstdcxx                 14.1.0               hc0a3c3a_1    conda-forge
     libstdcxx-devel_linux-64  11.4.0             h8f596e0_113    conda-forge
     libstdcxx-ng              14.1.0               h4852527_1    conda-forge
     libthrift                 0.20.0               h0e7cc3e_1    conda-forge
     libtiff                   4.7.0                he137b08_1    conda-forge
     libtorch                  2.4.1           cpu_mkl_he3c781b_100    conda-forge
     libutf8proc               2.8.0                h166bdaf_0    conda-forge
     libuuid                   2.38.1               h0b41bf4_0    conda-forge
     libuv                     1.49.1               hb9d3cd8_0    conda-forge
     libwebp-base              1.4.0                hd590300_0    conda-forge
     libxcb                    1.17.0               h8a09558_0    conda-forge
     libxcrypt                 4.4.36               hd590300_1    conda-forge
     libxml2                   2.12.7               he7c6b58_4    conda-forge
     libzlib                   1.3.1                hb9d3cd8_2    conda-forge
     llvm-openmp               19.1.1               h024ca30_0    conda-forge
     llvmlite                  0.43.0          py312h374181b_1    conda-forge
     locket                    1.0.0              pyhd8ed1ab_0    conda-forge
     lz4                       4.3.3           py312hb3f7f12_1    conda-forge
     lz4-c                     1.9.4                hcb278e6_0    conda-forge
     make                      4.4.1                hb9d3cd8_2    conda-forge
     makefun                   1.15.6             pyhd8ed1ab_0    conda-forge
     markdown                  3.6                pyhd8ed1ab_0    conda-forge
     markdown-it-py            3.0.0              pyhd8ed1ab_0    conda-forge
     markupsafe                3.0.1           py312h178313f_1    conda-forge
     matplotlib-inline         0.1.7              pyhd8ed1ab_0    conda-forge
     mdit-py-plugins           0.4.2              pyhd8ed1ab_0    conda-forge
     mdurl                     0.1.2              pyhd8ed1ab_0    conda-forge
     mistune                   3.0.2              pyhd8ed1ab_0    conda-forge
     mkl                       2023.2.0         h84fe81f_50496    conda-forge
     moto                      5.0.16             pyhd8ed1ab_0    conda-forge
     mpc                       1.3.1                h24ddda3_1    conda-forge
     mpfr                      4.2.1                h90cbb55_3    conda-forge
     mpmath                    1.3.0              pyhd8ed1ab_0    conda-forge
     msgpack-python            1.1.0           py312h68727a3_0    conda-forge
     multidict                 6.1.0           py312h66e93f0_0    conda-forge
     multiprocess              0.70.15         py312h98912ed_1    conda-forge
     myst-nb                   1.1.2              pyhd8ed1ab_0    conda-forge
     myst-parser               4.0.0              pyhd8ed1ab_0    conda-forge
     nbclient                  0.10.0             pyhd8ed1ab_0    conda-forge
     nbconvert                 7.16.4               hd8ed1ab_1    conda-forge
     nbconvert-core            7.16.4             pyhd8ed1ab_1    conda-forge
     nbconvert-pandoc          7.16.4               hd8ed1ab_1    conda-forge
     nbformat                  5.10.4             pyhd8ed1ab_0    conda-forge
     nbsphinx                  0.9.5              pyhd8ed1ab_0    conda-forge
     ncurses                   6.5                  he02047a_1    conda-forge
     nest-asyncio              1.6.0              pyhd8ed1ab_0    conda-forge
     networkx                  3.4                pyhd8ed1ab_1    conda-forge
     ninja                     1.12.1               h297d8ca_0    conda-forge
     nodeenv                   1.9.1              pyhd8ed1ab_0    conda-forge
     notebook                  7.2.2              pyhd8ed1ab_0    conda-forge
     notebook-shim             0.2.4              pyhd8ed1ab_0    conda-forge
     numba                     0.60.0          py312h83e6fd3_0    conda-forge
     numba-cuda                0.0.17             pyh267e887_0    conda-forge
     numpy                     2.0.2           py312h58c1407_0    conda-forge
     numpydoc                  1.8.0              pyhd8ed1ab_0    conda-forge
     nvcomp                    4.0.1                hbc370b7_0    conda-forge
     nvtx                      0.2.10          py312h66e93f0_2    conda-forge
     openapi-schema-validator  0.6.2              pyhd8ed1ab_0    conda-forge
     openapi-spec-validator    0.7.1              pyhd8ed1ab_0    conda-forge
     openjpeg                  2.5.2                h488ebb8_0    conda-forge
     openpyxl                  3.1.5           py312h710cb58_1    conda-forge
     openssl                   3.3.2                hb9d3cd8_0    conda-forge
     orc                       2.0.2                h669347b_0    conda-forge
     overrides                 7.7.0              pyhd8ed1ab_0    conda-forge
     packaging                 24.1               pyhd8ed1ab_0    conda-forge
     pandas                    2.2.3           py312hf9745cd_1    conda-forge
     pandoc                    3.5                  ha770c72_0    conda-forge
     pandocfilters             1.5.0              pyhd8ed1ab_0    conda-forge
     parso                     0.8.4              pyhd8ed1ab_0    conda-forge
     partd                     1.4.2              pyhd8ed1ab_0    conda-forge
     pathable                  0.4.3              pyhd8ed1ab_0    conda-forge
     pathspec                  0.12.1             pyhd8ed1ab_0    conda-forge
     pexpect                   4.9.0              pyhd8ed1ab_0    conda-forge
     pickleshare               0.7.5                   py_1003    conda-forge
     pillow                    10.4.0          py312h56024de_1    conda-forge
     pip                       24.2               pyh8b19718_1    conda-forge
     pkgutil-resolve-name      1.3.10             pyhd8ed1ab_1    conda-forge
     platformdirs              4.3.6              pyhd8ed1ab_0    conda-forge
     pluggy                    1.5.0              pyhd8ed1ab_0    conda-forge
     polars                    1.8.2           py312hfe7c9be_0    conda-forge
     pre-commit                4.0.1              pyha770c72_0    conda-forge
     prometheus_client         0.21.0             pyhd8ed1ab_0    conda-forge
     prompt-toolkit            3.0.48             pyha770c72_0    conda-forge
     propcache                 0.2.0           py312h66e93f0_2    conda-forge
     psutil                    6.0.0           py312h66e93f0_1    conda-forge
     pthread-stubs             0.4               hb9d3cd8_1002    conda-forge
     ptyprocess                0.7.0              pyhd3deb0d_0    conda-forge
     pure_eval                 0.2.3              pyhd8ed1ab_0    conda-forge
     py-cpuinfo                9.0.0              pyhd8ed1ab_0    conda-forge
     pyarrow                   17.0.0          py312h9cebb41_1    conda-forge
     pyarrow-core              17.0.0          py312h9cafe31_1_cpu    conda-forge
     pyarrow-hotfix            0.6                pyhd8ed1ab_0    conda-forge
     pycparser                 2.22               pyhd8ed1ab_0    conda-forge
     pydata-sphinx-theme       0.15.4             pyhd8ed1ab_0    conda-forge
     pygments                  2.18.0             pyhd8ed1ab_0    conda-forge
     pylibcudf                 24.12.0                  pypi_0    pypi
     pynvjitlink               0.3.0           py312hd269673_0    rapidsai
     pynvml                    11.4.1             pyhd8ed1ab_0    conda-forge
     pyparsing                 3.1.4              pyhd8ed1ab_0    conda-forge
     pysocks                   1.7.1              pyha2e5f31_6    conda-forge
     pytest                    7.4.4              pyhd8ed1ab_0    conda-forge
     pytest-benchmark          4.0.0              pyhd8ed1ab_0    conda-forge
     pytest-cases              3.8.6              pyhd8ed1ab_0    conda-forge
     pytest-cov                5.0.0              pyhd8ed1ab_0    conda-forge
     pytest-xdist              3.6.1              pyhd8ed1ab_0    conda-forge
     python                    3.12.7          hc5c86c4_0_cpython    conda-forge
     python-confluent-kafka    2.5.3           py312h66e93f0_0    conda-forge
     python-dateutil           2.9.0              pyhd8ed1ab_0    conda-forge
     python-fastjsonschema     2.20.0             pyhd8ed1ab_0    conda-forge
     python-json-logger        2.0.7              pyhd8ed1ab_0    conda-forge
     python-tzdata             2024.2             pyhd8ed1ab_0    conda-forge
     python-xxhash             3.5.0           py312h66e93f0_1    conda-forge
     python_abi                3.12                    5_cp312    conda-forge
     pytorch                   2.4.1           cpu_mkl_py312hf535c18_100    conda-forge
     pytz                      2024.1             pyhd8ed1ab_0    conda-forge
     pyyaml                    6.0.2           py312h66e93f0_1    conda-forge
     pyzmq                     26.2.0          py312hbf22597_3    conda-forge
     rapids-build-backend      0.3.2                      py_0    rapidsai
     rapids-dask-dependency    24.12.00a6                 py_0    rapidsai-nightly
     rapids-dependency-file-generator 1.15.1                     py_0    rapidsai
     re2                       2023.09.01           h7f4b329_2    conda-forge
     readline                  8.2                  h8228510_1    conda-forge
     recommonmark              0.7.1              pyhd8ed1ab_0    conda-forge
     referencing               0.35.1             pyhd8ed1ab_0    conda-forge
     regex                     2024.9.11       py312h66e93f0_0    conda-forge
     requests                  2.32.3             pyhd8ed1ab_0    conda-forge
     responses                 0.25.3             pyhd8ed1ab_0    conda-forge
     rfc3339-validator         0.1.4              pyhd8ed1ab_0    conda-forge
     rfc3986-validator         0.1.1              pyh9f0ad1d_0    conda-forge
     rhash                     1.4.4                hd590300_0    conda-forge
     rich                      13.9.2             pyhd8ed1ab_0    conda-forge
     rmm                       24.12.00a15     cuda12_py312_241011_g90a5631e_15    rapidsai-nightly
     rpds-py                   0.20.0          py312h12e396e_1    conda-forge
     s2n                       1.5.5                h3931f03_0    conda-forge
     s3fs                      2024.9.0           pyhd8ed1ab_0    conda-forge
     s3transfer                0.10.3             pyhd8ed1ab_0    conda-forge
     safetensors               0.4.5           py312h12e396e_0    conda-forge
     scikit-build-core         0.10.7             pyh4afc917_0    conda-forge
     scipy                     1.14.1          py312h7d485d2_0    conda-forge
     send2trash                1.8.3              pyh0d859eb_0    conda-forge
     setuptools                75.1.0             pyhd8ed1ab_0    conda-forge
     six                       1.16.0             pyh6c4a22f_0    conda-forge
     sleef                     3.7                  h1b44611_0    conda-forge
     snappy                    1.2.1                ha2e4443_0    conda-forge
     sniffio                   1.3.1              pyhd8ed1ab_0    conda-forge
     snowballstemmer           2.2.0              pyhd8ed1ab_0    conda-forge
     sortedcontainers          2.4.0              pyhd8ed1ab_0    conda-forge
     soupsieve                 2.5                pyhd8ed1ab_1    conda-forge
     spdlog                    1.14.1               hed91bc2_1    conda-forge
     sphinx                    7.1.2              pyhd8ed1ab_0    conda-forge
     sphinx-autobuild          2024.10.3          pyhd8ed1ab_0    conda-forge
     sphinx-copybutton         0.5.2              pyhd8ed1ab_0    conda-forge
     sphinx-markdown-tables    0.0.17             pyh6c4a22f_0    conda-forge
     sphinx-remove-toctrees    1.0.0.post1        pyhd8ed1ab_0    conda-forge
     sphinxcontrib-applehelp   2.0.0              pyhd8ed1ab_0    conda-forge
     sphinxcontrib-devhelp     2.0.0              pyhd8ed1ab_0    conda-forge
     sphinxcontrib-htmlhelp    2.1.0              pyhd8ed1ab_0    conda-forge
     sphinxcontrib-jsmath      1.0.1              pyhd8ed1ab_0    conda-forge
     sphinxcontrib-qthelp      2.0.0              pyhd8ed1ab_0    conda-forge
     sphinxcontrib-serializinghtml 1.1.10             pyhd8ed1ab_0    conda-forge
     sphinxcontrib-websupport  1.2.7              pyhd8ed1ab_0    conda-forge
     sqlalchemy                2.0.35          py312h66e93f0_0    conda-forge
     stack_data                0.6.2              pyhd8ed1ab_0    conda-forge
     starlette                 0.39.2             pyhd8ed1ab_0    conda-forge
     streamz                   0.6.4              pyh6c4a22f_0    conda-forge
     sympy                     1.13.3           pyh2585a3b_104    conda-forge
     sysroot_linux-64          2.17                h4a8ded7_17    conda-forge
     tabulate                  0.9.0              pyhd8ed1ab_1    conda-forge
     tbb                       2021.13.0            h84d6215_0    conda-forge
     tblib                     3.0.0              pyhd8ed1ab_0    conda-forge
     terminado                 0.18.1             pyh0d859eb_0    conda-forge
     tinycss2                  1.3.0              pyhd8ed1ab_0    conda-forge
     tk                        8.6.13          noxft_h4845f30_101    conda-forge
     tokenizers                0.15.2          py312hfef1a59_0    conda-forge
     toml                      0.10.2             pyhd8ed1ab_0    conda-forge
     tomli                     2.0.2              pyhd8ed1ab_0    conda-forge
     tomlkit                   0.13.2             pyha770c72_0    conda-forge
     toolz                     1.0.0              pyhd8ed1ab_0    conda-forge
     tornado                   6.4.1           py312h66e93f0_1    conda-forge
     tqdm                      4.66.5             pyhd8ed1ab_0    conda-forge
     traitlets                 5.14.3             pyhd8ed1ab_0    conda-forge
     transformers              4.39.3             pyhd8ed1ab_0    conda-forge
     types-python-dateutil     2.9.0.20241003     pyhff2d567_0    conda-forge
     types-pyyaml              6.0.12.20240917    pyhd8ed1ab_0    conda-forge
     typing-extensions         4.12.2               hd8ed1ab_0    conda-forge
     typing_extensions         4.12.2             pyha770c72_0    conda-forge
     typing_utils              0.1.0              pyhd8ed1ab_0    conda-forge
     tzdata                    2024b                hc8b5060_0    conda-forge
     ukkonen                   1.0.1           py312h68727a3_5    conda-forge
     uri-template              1.3.0              pyhd8ed1ab_0    conda-forge
     urllib3                   2.2.3              pyhd8ed1ab_0    conda-forge
     uvicorn                   0.31.1          py312h7900ff3_0    conda-forge
     virtualenv                20.26.6            pyhd8ed1ab_0    conda-forge
     watchfiles                0.24.0          py312h12e396e_1    conda-forge
     wcwidth                   0.2.13             pyhd8ed1ab_0    conda-forge
     webcolors                 24.8.0             pyhd8ed1ab_0    conda-forge
     webencodings              0.5.1              pyhd8ed1ab_2    conda-forge
     websocket-client          1.8.0              pyhd8ed1ab_0    conda-forge
     websockets                13.1            py312h66e93f0_0    conda-forge
     werkzeug                  3.0.4              pyhd8ed1ab_0    conda-forge
     wheel                     0.44.0             pyhd8ed1ab_0    conda-forge
     wrapt                     1.16.0          py312h66e93f0_1    conda-forge
     xmltodict                 0.14.1             pyhd8ed1ab_0    conda-forge
     xorg-libxau               1.0.11               hb9d3cd8_1    conda-forge
     xorg-libxdmcp             1.1.5                hb9d3cd8_0    conda-forge
     xxhash                    0.8.2                hd590300_0    conda-forge
     xyzservices               2024.9.0           pyhd8ed1ab_0    conda-forge
     xz                        5.2.6                h166bdaf_0    conda-forge
     yaml                      0.2.5                h7f98852_2    conda-forge
     yarl                      1.14.0          py312h66e93f0_0    conda-forge
     zeromq                    4.3.5                h3b0a872_6    conda-forge
     zict                      3.0.0              pyhd8ed1ab_0    conda-forge
     zipp                      3.20.2             pyhd8ed1ab_0    conda-forge
     zlib                      1.3.1                hb9d3cd8_2    conda-forge
     zstandard                 0.23.0          py312hef9b889_1    conda-forge
     zstd                      1.5.6                ha6fb4c9_0    conda-forge

Additional context Add any other context about the problem here.

wence- commented 6 days ago

I had a look at this. It boils down to:

x = 9180.52952127610660682
y = 91.8052952127610666366
floordiv = x // y
print(f"{floordiv:.20f}") # => 99.00000000000000000000
#include <cmath>
#include <iostream>
#include <iomanip>
int main() {
  double x = 9180.52952127610660682;
  double y = 91.8052952127610666366;

  double result = std::floor(x / y);

  std::cout << std::fixed << std::setprecision(20);
  std::cout << result << std::endl;
}
// => 100.00000000000000000000 

The libcudf implementation is (matching the C++ I wrote):

$$ q_\text{floor} := \left\lfloor \frac{x}{y} \right\rfloor $$

The pandas/numpy/python implementation uses divmod, which finds a float $q{\text{divmod}}$ such that for a pair $x$ and $y$, $q{\text{divmod}} y + (x \mod y) \approx x$. Usually this is the same as $q_\text{floor}$ but is sometimes one less, this bug is one such example.

In particular, note that since $y$ in this example is slightly greater than $x / 100$ (if computed exactly), then $x \mod y$ is only slightly less than $y$ (and not close to zero), which means that $q_\text{floor} y + (x \mod y) = x + y$.

If we don't care about preserving the sign of zero, we could implement this as:

template<typename T>
auto py_floor_div(T x, T y) -> decltype(x / y) {
  auto const mod = std::fmod(x, y);
  auto div = (x - mod) / y;
  // fixup for mixed signs
  div -= (mod != 0 && (std::signbit(x) ^ std::signbit(y)));
  auto const floordiv = std::floor(div);
  // fixup
  return floordiv + (div - floordiv > 0.5);
}
wence- commented 6 days ago

I haven't checked spark. cudf.polars wants the libcudf implementation.

davidwendt commented 6 days ago

A previous issue just for reference: https://github.com/rapidsai/cudf/issues/12120 We try to keep libcudf consistent with std C++ library behavior when possible.

wence- commented 6 days ago

We try to keep libcudf consistent with std C++ library behavior when possible.

Just to note, we already have carveouts for different modulo behaviour in the binop implementation:

  MOD,          ///< operator %
  PMOD,         ///< positive modulo operator
                ///< If remainder is negative, this returns (remainder + divisor) % divisor
                ///< else, it returns (dividend % divisor)
  PYMOD,        ///< operator % but following Python's sign rules for negatives

If we don't want to incorporate all the additional operations that might be required (for example, I would like an implementation of GREATER/LESS/etc.. for floating point types that uses the IEEE total order on floats in the presence of NaN values), this seems like a good place to consider the ongoing jitify/nvrtc improvements (cc @lamarrr): we have binary_operation(..., std::string const& ptx, ...) but that looks to be quite numba-specific right now.