rapidsai / cudf

cuDF - GPU DataFrame Library
https://docs.rapids.ai/api/cudf/stable/
Apache License 2.0
8.44k stars 903 forks source link

[BUG] dask_cudf.read_csv: byte ranges cannot be combined with row limits #13552

Closed stmio closed 1 year ago

stmio commented 1 year ago

Describe the bug

When reading csv files with dask cudf, using the skiprows or skipfooter parameters causes the following error:

ValueError: cannot manually limit rows to be read when using the byte range parameter

Steps/Code to reproduce bug

main.py

import dask_cudf

data = dask_cudf.read_csv("./data.csv", skiprows=3).set_index("A")

data.csv

x
x
x
A, B, C, D
1, 2, 3, 4
2, 3, 5, 1
4, 5, 2, 5

Running the same code with cudf instead of dask_cudf works as expected.

Full traceback: traceback.txt

Expected behavior

CSV file is read and stored as a dask dataframe, skipping the first three rows that do not contain any valid data.

Environment overview

Environment details

Click here to see environment details

     **git***
print_env.sh: 11: [: unexpected operator
     Not inside a git repository

     ***OS Information***
     DISTRIB_ID=Ubuntu
     DISTRIB_RELEASE=20.04
     DISTRIB_CODENAME=focal
     DISTRIB_DESCRIPTION="Ubuntu 20.04.6 LTS"
     NAME="Ubuntu"
     VERSION="20.04.6 LTS (Focal Fossa)"
     ID=ubuntu
     ID_LIKE=debian
     PRETTY_NAME="Ubuntu 20.04.6 LTS"
     VERSION_ID="20.04"
     HOME_URL="https://www.ubuntu.com/"
     SUPPORT_URL="https://help.ubuntu.com/"
     BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
     PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
     VERSION_CODENAME=focal
     UBUNTU_CODENAME=focal
     Linux DESKTOP 5.15.90.1-microsoft-standard-WSL2 #1 SMP Fri Jan 27 02:56:13 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

     ***GPU Information***
     Mon Jun 12 16:29:02 2023
     +---------------------------------------------------------------------------------------+
     | NVIDIA-SMI 530.30.02              Driver Version: 531.14       CUDA Version: 12.1     |
     |-----------------------------------------+----------------------+----------------------+
     | GPU  Name                  Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
     | Fan  Temp  Perf            Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
     |                                         |                      |               MIG M. |
     |=========================================+======================+======================|
     |   0  NVIDIA GeForce RTX 3060 Ti      On | 00000000:26:00.0  On |                  N/A |
     |  0%   41C    P8               13W / 200W|    712MiB /  8192MiB |      0%      Default |
     |                                         |                      |                  N/A |
     +-----------------------------------------+----------------------+----------------------+

     +---------------------------------------------------------------------------------------+
     | Processes:                                                                            |
     |  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
     |        ID   ID                                                             Usage      |
     |=======================================================================================|
     |    0   N/A  N/A       262      G   /Xwayland                                 N/A      |
     +---------------------------------------------------------------------------------------+

     ***CPU***
     Architecture:                    x86_64
     CPU op-mode(s):                  32-bit, 64-bit
     Byte Order:                      Little Endian
     Address sizes:                   48 bits physical, 48 bits virtual
     CPU(s):                          12
     On-line CPU(s) list:             0-11
     Thread(s) per core:              2
     Core(s) per socket:              6
     Socket(s):                       1
     Vendor ID:                       AuthenticAMD
     CPU family:                      23
     Model:                           113
     Model name:                      AMD Ryzen 5 3600 6-Core Processor
     Stepping:                        0
     CPU MHz:                         3600.020
     BogoMIPS:                        7200.04
     Hypervisor vendor:               Microsoft
     Virtualization type:             full
     L1d cache:                       192 KiB
     L1i cache:                       192 KiB
     L2 cache:                        3 MiB
     L3 cache:                        16 MiB
     Vulnerability Itlb multihit:     Not affected
     Vulnerability L1tf:              Not affected
     Vulnerability Mds:               Not affected
     Vulnerability Meltdown:          Not affected
     Vulnerability Mmio stale data:   Not affected
     Vulnerability Retbleed:          Mitigation; untrained return thunk; SMT enabled with STIBP protection
     Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
     Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user pointer sanitization
     Vulnerability Spectre v2:        Mitigation; Retpolines, IBPB conditional, STIBP always-on, RSB filling, PBRSB-eIBRS Not affected
     Vulnerability Srbds:             Not affected
     Vulnerability Tsx async abort:   Not affected
     Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl tsc_reliable nonstop_tsc cpuid extd_apicid pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw topoext ssbd ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves clzero xsaveerptr virt_ssbd arat umip rdpid

     ***CMake***

     ***g++***
     /usr/bin/g++
     g++ (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
     Copyright (C) 2019 Free Software Foundation, Inc.
     This is free software; see the source for copying conditions.  There is NO
     warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

     ***nvcc***

     ***Python***
     /home/sam/miniconda3/envs/rapids-23.04/bin/python
     Python 3.10.11

     ***Environment Variables***
     PATH                            : /home/sam/.local/bin:/home/sam/miniconda3/envs/rapids-23.04/bin:/home/sam/miniconda3/condabin:/home/sam/.nvm/versions/node/v16.15.1/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/usr/lib/wsl/lib:/mnt/c/Program
     Files/WindowsApps/Microsoft.PowerShell_7.3.4.0_x64__8wekyb3d8bbwe:/mnt/c/Program: Files/NVIDIA
     GPU                             : Computing
     Toolkit/CUDA/v12.1/bin:/mnt/c/Program: Files/NVIDIA
     GPU                             : Computing
     Toolkit/CUDA/v12.1/libnvvp:/mnt/c/Windows/system32:/mnt/c/Windows:/mnt/c/Windows/System32/Wbem:/mnt/c/Windows/System32/WindowsPowerShell/v1.0:/mnt/c/Windows/System32/OpenSSH:/mnt/c/Program: Files/NVIDIA
     Corporation/NVIDIA              : NvDLISR:/mnt/c/Program
     Files/dotnet:/mnt/c/Program     : Files
     (x86)/NVIDIA                    : Corporation/PhysX/Common:/mnt/c/Program
     Files                           : (x86)/GnuPG/bin:/mnt/c/Program
     Files/Git/cmd:/mnt/c/Program    : Files/NVIDIA
     Corporation/Nsight              : Compute
     2023.1.1:/mnt/c/Users/Sam/scoop/apps/miniconda3/current/scripts:/mnt/c/Users/Sam/scoop/apps/miniconda3/current/Library/bin:/mnt/c/Users/Sam/AppData/Local/Programs/Python/Python310/Scripts:/mnt/c/Users/Sam/AppData/Local/Programs/Python/Python310:/mnt/c/Users/Sam/scoop/apps/gcc/current/bin:/mnt/c/Users/Sam/scoop/apps/nvm/current/nodejs/nodejs:/mnt/c/Users/Sam/scoop/shims:/mnt/c/Users/Sam/AppData/Local/Microsoft/WindowsApps:/mnt/c/Users/Sam/.dotnet/tools:/mnt/c/Users/Sam/AppData/Local/GitHubDesktop/bin:/mnt/c/Users/Sam/.dotnet/tools:/mnt/c/Users/Sam/Godot:/mnt/c/Users/Sam/AppData/Local/lua-language-server-3.5.3/bin:/mnt/c/Users/Sam/AppData/Local/gitkraken/bin:/mnt/c/Users/Sam/AppData/Local/JetBrains/Toolbox/scripts:/mnt/c/Users/Sam/AppData/Local/Programs/Microsoft: VS
     Code/bin:/mnt/c/Users/Sam/AppData/Local/Pandoc:/mnt/c/Users/Sam/AppData/Local/Programs/MiKTeX/miktex/bin/x64:/snap/bin:
     LD_LIBRARY_PATH                 :
     NUMBAPRO_NVVM                   :
     NUMBAPRO_LIBDEVICE              :
     CONDA_PREFIX                    : /home/sam/miniconda3/envs/rapids-23.04
     PYTHON_PATH                     :

     ***conda packages***
     conda is /home/sam/miniconda3/condabin/conda
     /home/sam/miniconda3/condabin/conda
     # packages in environment at /home/sam/miniconda3/envs/rapids-23.04:
     #
     # Name                    Version                   Build  Channel
     _libgcc_mutex             0.1                 conda_forge    conda-forge
     _openmp_mutex             4.5                       2_gnu    conda-forge
     aiohttp                   3.8.4           py310h2372a71_1    conda-forge
     aiosignal                 1.3.1              pyhd8ed1ab_0    conda-forge
     anyio                     3.7.0              pyhd8ed1ab_1    conda-forge
     aom                       3.5.0                h27087fc_0    conda-forge
     appdirs                   1.4.4              pyh9f0ad1d_0    conda-forge
     argon2-cffi               21.3.0             pyhd8ed1ab_0    conda-forge
     argon2-cffi-bindings      21.2.0          py310h5764c6d_3    conda-forge
     asttokens                 2.2.1              pyhd8ed1ab_0    conda-forge
     async-timeout             4.0.2              pyhd8ed1ab_0    conda-forge
     attrs                     23.1.0             pyh71513ae_1    conda-forge
     aws-c-auth                0.6.27               he072965_1    conda-forge
     aws-c-cal                 0.5.26               hf677bf3_1    conda-forge
     aws-c-common              0.8.19               hd590300_0    conda-forge
     aws-c-compression         0.2.16               hbad4bc6_7    conda-forge
     aws-c-event-stream        0.2.20               hb4b372c_7    conda-forge
     aws-c-http                0.7.7                h2632f9a_4    conda-forge
     aws-c-io                  0.13.21              h9fef7b8_5    conda-forge
     aws-c-mqtt                0.8.11               h2282364_1    conda-forge
     aws-c-s3                  0.3.0                hcb5a9b2_2    conda-forge
     aws-c-sdkutils            0.1.9                hbad4bc6_2    conda-forge
     aws-checksums             0.1.14               hbad4bc6_7    conda-forge
     aws-crt-cpp               0.20.2               he0fdcb3_0    conda-forge
     aws-sdk-cpp               1.10.57             h059227d_13    conda-forge
     backcall                  0.2.0              pyh9f0ad1d_0    conda-forge
     backports                 1.0                pyhd8ed1ab_3    conda-forge
     backports.functools_lru_cache 1.6.4              pyhd8ed1ab_0    conda-forge
     beautifulsoup4            4.12.2             pyha770c72_0    conda-forge
     bleach                    6.0.0              pyhd8ed1ab_0    conda-forge
     blosc                     1.21.4               h0f2a231_0    conda-forge
     bokeh                     2.4.3              pyhd8ed1ab_3    conda-forge
     boost-cpp                 1.78.0               h5adbc97_2    conda-forge
     branca                    0.6.0              pyhd8ed1ab_0    conda-forge
     brotli                    1.0.9                h166bdaf_8    conda-forge
     brotli-bin                1.0.9                h166bdaf_8    conda-forge
     brotlipy                  0.7.0           py310h5764c6d_1005    conda-forge
     brunsli                   0.1                  h9c3ff4c_0    conda-forge
     bzip2                     1.0.8                h7f98852_4    conda-forge
     c-ares                    1.19.1               hd590300_0    conda-forge
     c-blosc2                  2.9.2                hb4ffafa_0    conda-forge
     ca-certificates           2023.5.7             hbcca054_0    conda-forge
     cachetools                5.3.0              pyhd8ed1ab_0    conda-forge
     cairo                     1.16.0            ha61ee94_1014    conda-forge
     certifi                   2023.5.7           pyhd8ed1ab_0    conda-forge
     cffi                      1.15.1          py310h255011f_3    conda-forge
     cfitsio                   4.2.0                hd9d235c_0    conda-forge
     charls                    2.4.2                h59595ed_0    conda-forge
     charset-normalizer        3.1.0              pyhd8ed1ab_0    conda-forge
     click                     8.1.3           unix_pyhd8ed1ab_2    conda-forge
     click-plugins             1.1.1                      py_0    conda-forge
     cligj                     0.7.2              pyhd8ed1ab_1    conda-forge
     cloudpickle               2.2.1              pyhd8ed1ab_0    conda-forge
     colorama                  0.4.6              pyhd8ed1ab_0    conda-forge
     colorcet                  3.0.1              pyhd8ed1ab_0    conda-forge
     comm                      0.1.3              pyhd8ed1ab_0    conda-forge
     contourpy                 1.0.7           py310hdf3cbec_0    conda-forge
     cryptography              41.0.1          py310h75e40e8_0    conda-forge
     cubinlinker               0.3.0           py310hfdf336d_0    rapidsai
     cucim                     23.04.01        cuda_11_py310_230413_g4e2346c_0    rapidsai
     cuda-profiler-api         11.8.86                       0    nvidia
     cuda-python               11.8.2          py310h01a121a_0    conda-forge
     cuda-version              11.8                 h70ddcb2_2    conda-forge
     cudatoolkit               11.8.0              h37601d7_11    conda-forge
     cudf                      23.04.01        cuda_11_py310_230421_g7e070fce16_0    rapidsai
     cudf_kafka                23.04.01        py310_230421_g7e070fce16_0    rapidsai
     cugraph                   23.04.01        cuda11_py310_230421_g73f4327a_0    rapidsai
     cuml                      23.04.01        cuda11_py310_230421_g958186d07_0    rapidsai
     cupy                      11.6.0          py310h9216885_0    conda-forge
     curl                      8.1.2                h409715c_0    conda-forge
     cusignal                  23.04.00        py38_230412_g7dd2c99_0    rapidsai
     cuspatial                 23.04.00        py310_230412_g6958dcb2_0    rapidsai
     custreamz                 23.04.01        py310_230421_g7e070fce16_0    rapidsai
     cuxfilter                 23.04.00        py310_230412_gfd7f336_0    rapidsai
     cycler                    0.11.0             pyhd8ed1ab_0    conda-forge
     cyrus-sasl                2.1.27               h9033bb2_6    conda-forge
     cytoolz                   0.12.0          py310h5764c6d_1    conda-forge
     dask                      2023.3.2           pyhd8ed1ab_0    conda-forge
     dask-core                 2023.3.2           pyhd8ed1ab_0    conda-forge
     dask-cuda                 23.04.00        py310_230412_gd4d6a02_0    rapidsai
     dask-cudf                 23.04.01        cuda_11_py310_230421_g7e070fce16_0    rapidsai
     datashader                0.14.4             pyh1a96a4e_0    conda-forge
     datashape                 0.5.4                      py_1    conda-forge
     dav1d                     1.2.1                hd590300_0    conda-forge
     debugpy                   1.6.7           py310heca2aa9_0    conda-forge
     decorator                 5.1.1              pyhd8ed1ab_0    conda-forge
     defusedxml                0.7.1              pyhd8ed1ab_0    conda-forge
     distributed               2023.3.2.1         pyhd8ed1ab_0    conda-forge
     dlpack                    0.5                  h9c3ff4c_0    conda-forge
     entrypoints               0.4                pyhd8ed1ab_0    conda-forge
     exceptiongroup            1.1.1              pyhd8ed1ab_0    conda-forge
     executing                 1.2.0              pyhd8ed1ab_0    conda-forge
     expat                     2.5.0                hcb278e6_1    conda-forge
     fastavro                  1.7.4           py310h2372a71_0    conda-forge
     fastrlock                 0.8             py310hd8f1fbe_3    conda-forge
     fiona                     1.9.1           py310ha325b7b_0    conda-forge
     flit-core                 3.9.0              pyhd8ed1ab_0    conda-forge
     fmt                       9.1.0                h924138e_0    conda-forge
     folium                    0.14.0             pyhd8ed1ab_0    conda-forge
     font-ttf-dejavu-sans-mono 2.37                 hab24e00_0    conda-forge
     font-ttf-inconsolata      3.000                h77eed37_0    conda-forge
     font-ttf-source-code-pro  2.038                h77eed37_0    conda-forge
     font-ttf-ubuntu           0.83                 hab24e00_0    conda-forge
     fontconfig                2.14.2               h14ed4e7_0    conda-forge
     fonts-conda-ecosystem     1                             0    conda-forge
     fonts-conda-forge         1                             0    conda-forge
     fonttools                 4.39.4          py310h2372a71_0    conda-forge
     freetype                  2.12.1               hca18f0e_1    conda-forge
     freexl                    1.0.6                h166bdaf_1    conda-forge
     frozenlist                1.3.3           py310h5764c6d_0    conda-forge
     fsspec                    2023.6.0           pyh1a96a4e_0    conda-forge
     gdal                      3.6.2           py310hc1b7723_3    conda-forge
     geopandas                 0.13.2             pyhd8ed1ab_1    conda-forge
     geopandas-base            0.13.2             pyha770c72_1    conda-forge
     geos                      3.11.1               h27087fc_0    conda-forge
     geotiff                   1.7.1                h7157cca_5    conda-forge
     gettext                   0.21.1               h27087fc_0    conda-forge
     gflags                    2.2.2             he1b5a44_1004    conda-forge
     giflib                    5.2.1                h0b41bf4_3    conda-forge
     glog                      0.6.0                h6f12383_0    conda-forge
     gmock                     1.10.0               h4bd325d_7    conda-forge
     gtest                     1.10.0               h4bd325d_7    conda-forge
     hdf4                      4.2.15               h9772cbc_5    conda-forge
     hdf5                      1.12.2          nompi_h4df4325_101    conda-forge
     holoviews                 1.15.4             pyhd8ed1ab_0    conda-forge
     icu                       70.1                 h27087fc_0    conda-forge
     idna                      3.4                pyhd8ed1ab_0    conda-forge
     imagecodecs               2023.1.23       py310ha3ed6a1_0    conda-forge
     imageio                   2.28.1             pyh24c5eb1_0    conda-forge
     importlib-metadata        6.6.0              pyha770c72_0    conda-forge
     importlib_metadata        6.6.0                hd8ed1ab_0    conda-forge
     importlib_resources       5.12.0             pyhd8ed1ab_0    conda-forge
     ipykernel                 6.23.1             pyh210e3f2_0    conda-forge
     ipython                   8.14.0             pyh41d4057_0    conda-forge
     ipywidgets                8.0.6              pyhd8ed1ab_0    conda-forge
     jbig                      2.1               h7f98852_2003    conda-forge
     jedi                      0.18.2             pyhd8ed1ab_0    conda-forge
     jinja2                    3.1.2              pyhd8ed1ab_1    conda-forge
     joblib                    1.2.0              pyhd8ed1ab_0    conda-forge
     jpeg                      9e                   h0b41bf4_3    conda-forge
     json-c                    0.16                 hc379101_0    conda-forge
     jsonschema                4.17.3             pyhd8ed1ab_0    conda-forge
     jupyter-server-proxy      4.0.0              pyhd8ed1ab_0    conda-forge
     jupyter_client            8.2.0              pyhd8ed1ab_0    conda-forge
     jupyter_core              5.3.0           py310hff52083_0    conda-forge
     jupyter_events            0.6.3              pyhd8ed1ab_0    conda-forge
     jupyter_server            2.6.0              pyhd8ed1ab_0    conda-forge
     jupyter_server_terminals  0.4.4              pyhd8ed1ab_1    conda-forge
     jupyterlab_pygments       0.2.2              pyhd8ed1ab_0    conda-forge
     jupyterlab_widgets        3.0.7              pyhd8ed1ab_1    conda-forge
     jxrlib                    1.1                  h7f98852_2    conda-forge
     kealib                    1.5.0                ha7026e8_0    conda-forge
     keyutils                  1.6.1                h166bdaf_0    conda-forge
     kiwisolver                1.4.4           py310hbf28c38_1    conda-forge
     krb5                      1.20.1               h81ceb04_0    conda-forge
     lazy_loader               0.2                pyhd8ed1ab_0    conda-forge
     lcms2                     2.15                 hfd0df8a_0    conda-forge
     ld_impl_linux-64          2.40                 h41732ed_0    conda-forge
     lerc                      4.0.0                h27087fc_0    conda-forge
     libabseil                 20230125.2      cxx17_h59595ed_2    conda-forge
     libaec                    1.0.6                hcb278e6_1    conda-forge
     libarrow                  10.0.1          h17fb9fa_28_cpu    conda-forge
     libavif                   0.11.1               h8182462_2    conda-forge
     libblas                   3.9.0           17_linux64_openblas    conda-forge
     libbrotlicommon           1.0.9                h166bdaf_8    conda-forge
     libbrotlidec              1.0.9                h166bdaf_8    conda-forge
     libbrotlienc              1.0.9                h166bdaf_8    conda-forge
     libcblas                  3.9.0           17_linux64_openblas    conda-forge
     libcrc32c                 1.1.2                h9c3ff4c_0    conda-forge
     libcublas                 11.11.3.6                     0    nvidia
     libcublas-dev             11.11.3.6                     0    nvidia
     libcucim                  23.04.01        cuda11_230413_g4e2346c_0    rapidsai
     libcudf                   23.04.01        cuda11_230421_g7e070fce16_0    rapidsai
     libcudf_kafka             23.04.01        230421_g7e070fce16_0    rapidsai
     libcufft                  10.9.0.58                     0    nvidia
     libcugraph                23.04.01        cuda11_230421_g73f4327a_0    rapidsai
     libcugraph_etl            23.04.01        cuda11_230421_g73f4327a_0    rapidsai
     libcugraphops             23.04.00        cuda11_230412_ga76892e3_0    nvidia
     libcuml                   23.04.01        cuda11_230421_g958186d07_0    rapidsai
     libcumlprims              23.04.00        cuda11_230412_g7502d8e_0    nvidia
     libcurand                 10.3.0.86                     0    nvidia
     libcurand-dev             10.3.0.86                     0    nvidia
     libcurl                   8.1.2                h409715c_0    conda-forge
     libcusolver               11.4.1.48                     0    nvidia
     libcusolver-dev           11.4.1.48                     0    nvidia
     libcusparse               11.7.5.86                     0    nvidia
     libcusparse-dev           11.7.5.86                     0    nvidia
     libcuspatial              23.04.00        cuda11_230412_g6958dcb2_0    rapidsai
     libdeflate                1.17                 h0b41bf4_0    conda-forge
     libedit                   3.1.20191231         he28a2e2_2    conda-forge
     libev                     4.33                 h516909a_1    conda-forge
     libevent                  2.1.12               hf998b51_1    conda-forge
     libexpat                  2.5.0                hcb278e6_1    conda-forge
     libffi                    3.4.2                h7f98852_5    conda-forge
     libgcc-ng                 13.1.0               he5830b7_0    conda-forge
     libgcrypt                 1.10.1               h166bdaf_0    conda-forge
     libgdal                   3.6.2                h10cbb15_3    conda-forge
     libgfortran-ng            13.1.0               h69a702a_0    conda-forge
     libgfortran5              13.1.0               h15d22d2_0    conda-forge
     libglib                   2.76.3               hebfc3b9_0    conda-forge
     libgomp                   13.1.0               he5830b7_0    conda-forge
     libgoogle-cloud           2.11.0               hac9eb74_1    conda-forge
     libgpg-error              1.46                 h620e276_0    conda-forge
     libgrpc                   1.54.2               hb20ce57_2    conda-forge
     libgsasl                  1.8.0                         2    conda-forge
     libiconv                  1.17                 h166bdaf_0    conda-forge
     libkml                    1.3.0             h37653c0_1015    conda-forge
     liblapack                 3.9.0           17_linux64_openblas    conda-forge
     libllvm11                 11.1.0               he0ac6c6_5    conda-forge
     libnetcdf                 4.8.1           nompi_h261ec11_106    conda-forge
     libnghttp2                1.52.0               h61bc06f_0    conda-forge
     libnsl                    2.0.0                h7f98852_0    conda-forge
     libntlm                   1.4               h7f98852_1002    conda-forge
     libnuma                   2.0.16               h0b41bf4_1    conda-forge
     libopenblas               0.3.23          pthreads_h80387f5_0    conda-forge
     libpng                    1.6.39               h753d276_0    conda-forge
     libpq                     15.2                 hb675445_0    conda-forge
     libprotobuf               3.21.12              h3eb15da_0    conda-forge
     libraft                   23.04.01        cuda11_230421_gdc800d6f_0    rapidsai
     libraft-headers           23.04.01        cuda11_230421_gdc800d6f_0    rapidsai
     librdkafka                1.7.0                hb1989a6_1    conda-forge
     librmm                    23.04.01        cuda11_230421_geab50f46_0    rapidsai
     librttopo                 1.1.0               ha49c73b_12    conda-forge
     libsodium                 1.0.18               h36c2ea0_1    conda-forge
     libspatialindex           1.9.3                h9c3ff4c_4    conda-forge
     libspatialite             5.0.1               h7c8129e_22    conda-forge
     libsqlite                 3.42.0               h2797004_0    conda-forge
     libssh2                   1.11.0               h0841786_0    conda-forge
     libstdcxx-ng              13.1.0               hfd8a6a1_0    conda-forge
     libthrift                 0.18.1               h8fd135c_2    conda-forge
     libtiff                   4.5.0                h6adf6a1_2    conda-forge
     libutf8proc               2.8.0                h166bdaf_0    conda-forge
     libuuid                   2.38.1               h0b41bf4_0    conda-forge
     libuv                     1.44.2               h166bdaf_0    conda-forge
     libwebp                   1.2.4                h1daa5a0_1    conda-forge
     libwebp-base              1.2.4                h166bdaf_0    conda-forge
     libxcb                    1.13              h7f98852_1004    conda-forge
     libxgboost                1.7.5dev.rapidsai23.04       cuda_11_0    rapidsai
     libxml2                   2.10.3               hca2bb57_4    conda-forge
     libzip                    1.9.2                hc929e4a_1    conda-forge
     libzlib                   1.2.13               h166bdaf_4    conda-forge
     libzopfli                 1.0.3                h9c3ff4c_0    conda-forge
     llvmlite                  0.39.1          py310h58363a5_1    conda-forge
     locket                    1.0.0              pyhd8ed1ab_0    conda-forge
     lz4                       4.3.2           py310h0cfdcf0_0    conda-forge
     lz4-c                     1.9.3                h9c3ff4c_1    conda-forge
     mapclassify               2.5.0              pyhd8ed1ab_1    conda-forge
     markdown                  3.4.3              pyhd8ed1ab_0    conda-forge
     markupsafe                2.1.3           py310h2372a71_0    conda-forge
     matplotlib-base           3.7.1           py310he60537e_0    conda-forge
     matplotlib-inline         0.1.6              pyhd8ed1ab_0    conda-forge
     mistune                   2.0.5              pyhd8ed1ab_0    conda-forge
     msgpack-python            1.0.5           py310hdf3cbec_0    conda-forge
     multidict                 6.0.4           py310h1fa729e_0    conda-forge
     multipledispatch          0.6.0                      py_0    conda-forge
     munch                     3.0.0              pyhd8ed1ab_0    conda-forge
     munkres                   1.1.4              pyh9f0ad1d_0    conda-forge
     nbclient                  0.8.0              pyhd8ed1ab_0    conda-forge
     nbconvert-core            7.4.0              pyhd8ed1ab_0    conda-forge
     nbformat                  5.9.0              pyhd8ed1ab_0    conda-forge
     nccl                      2.17.1.1             h17a0586_1    conda-forge
     ncurses                   6.4                  hcb278e6_0    conda-forge
     nest-asyncio              1.5.6              pyhd8ed1ab_0    conda-forge
     networkx                  3.1                pyhd8ed1ab_0    conda-forge
     nodejs                    18.15.0              h8d033a5_0    conda-forge
     nspr                      4.35                 h27087fc_0    conda-forge
     nss                       3.89                 he45b914_0    conda-forge
     numba                     0.56.4          py310h0e39c9b_1    conda-forge
     numpy                     1.23.5          py310h53a5b5f_0    conda-forge
     nvtx                      0.2.5           py310h1fa729e_0    conda-forge
     openjpeg                  2.5.0                hfec8fc6_2    conda-forge
     openssl                   3.1.1                hd590300_1    conda-forge
     orc                       1.8.3                h2f23424_1    conda-forge
     overrides                 7.3.1              pyhd8ed1ab_0    conda-forge
     packaging                 23.1               pyhd8ed1ab_0    conda-forge
     pandas                    1.5.3           py310h9b08913_1    conda-forge
     pandocfilters             1.5.0              pyhd8ed1ab_0    conda-forge
     panel                     0.14.1             pyhd8ed1ab_0    conda-forge
     param                     1.13.0             pyh1a96a4e_0    conda-forge
     parso                     0.8.3              pyhd8ed1ab_0    conda-forge
     partd                     1.4.0              pyhd8ed1ab_0    conda-forge
     patsy                     0.5.3              pyhd8ed1ab_0    conda-forge
     pcre2                     10.40                hc3806b6_0    conda-forge
     pexpect                   4.8.0              pyh1a96a4e_2    conda-forge
     pickleshare               0.7.5                   py_1003    conda-forge
     pillow                    9.4.0           py310h023d228_1    conda-forge
     pip                       23.1.2             pyhd8ed1ab_0    conda-forge
     pixman                    0.40.0               h36c2ea0_0    conda-forge
     pkgutil-resolve-name      1.3.10             pyhd8ed1ab_0    conda-forge
     platformdirs              3.5.1              pyhd8ed1ab_0    conda-forge
     pooch                     1.7.0              pyha770c72_3    conda-forge
     poppler                   22.12.0              h091648b_1    conda-forge
     poppler-data              0.4.12               hd8ed1ab_0    conda-forge
     postgresql                15.2                 h3248436_0    conda-forge
     proj                      9.1.0                h8ffa02c_1    conda-forge
     prometheus_client         0.17.0             pyhd8ed1ab_0    conda-forge
     prompt-toolkit            3.0.38             pyha770c72_0    conda-forge
     prompt_toolkit            3.0.38               hd8ed1ab_0    conda-forge
     protobuf                  4.21.12         py310heca2aa9_0    conda-forge
     psutil                    5.9.5           py310h1fa729e_0    conda-forge
     pthread-stubs             0.4               h36c2ea0_1001    conda-forge
     ptxcompiler               0.8.1           py310h01a121a_0    conda-forge
     ptyprocess                0.7.0              pyhd3deb0d_0    conda-forge
     pure_eval                 0.2.2              pyhd8ed1ab_0    conda-forge
     py-xgboost                1.7.5dev.rapidsai23.04 cuda_11_py310_0    rapidsai
     pyarrow                   10.0.1          py310he6bfd7f_28_cpu    conda-forge
     pycparser                 2.21               pyhd8ed1ab_0    conda-forge
     pyct                      0.4.6                      py_0    conda-forge
     pyct-core                 0.4.6                      py_0    conda-forge
     pydeck                    0.5.0              pyh9f0ad1d_0    conda-forge
     pyee                      8.1.0              pyhd8ed1ab_0    conda-forge
     pygments                  2.15.1             pyhd8ed1ab_0    conda-forge
     pylibcugraph              23.04.01        cuda11_py310_230421_g73f4327a_0    rapidsai
     pylibraft                 23.04.01        cuda11_py310_230421_gdc800d6f_0    rapidsai
     pynvml                    11.4.1             pyhd8ed1ab_0    conda-forge
     pyopenssl                 23.2.0             pyhd8ed1ab_1    conda-forge
     pyparsing                 3.0.9              pyhd8ed1ab_0    conda-forge
     pyppeteer                 1.0.2              pyhd8ed1ab_0    conda-forge
     pyproj                    3.4.0           py310hb1338dc_2    conda-forge
     pyrsistent                0.19.3          py310h1fa729e_0    conda-forge
     pysocks                   1.7.1              pyha2e5f31_6    conda-forge
     python                    3.10.11         he550d4f_0_cpython    conda-forge
     python-confluent-kafka    1.7.0           py310h6acc77f_2    conda-forge
     python-dateutil           2.8.2              pyhd8ed1ab_0    conda-forge
     python-fastjsonschema     2.17.1             pyhd8ed1ab_0    conda-forge
     python-json-logger        2.0.7              pyhd8ed1ab_0    conda-forge
     python_abi                3.10                    3_cp310    conda-forge
     pytz                      2023.3             pyhd8ed1ab_0    conda-forge
     pyviz_comms               2.3.1              pyhd8ed1ab_0    conda-forge
     pywavelets                1.4.1           py310h0a54255_0    conda-forge
     pyyaml                    6.0             py310h5764c6d_5    conda-forge
     pyzmq                     25.1.0          py310h5bbb5d0_0    conda-forge
     raft-dask                 23.04.01        cuda11_py310_230421_gdc800d6f_0    rapidsai
     rapids                    23.04.01        cuda11_py310_230420_g5663f20_129    rapidsai
     rapids-xgboost            23.04.01        cuda11_py310_230420_g5663f20_129    rapidsai
     rdma-core                 28.9                 h59595ed_1    conda-forge
     re2                       2023.03.02           h8c504da_0    conda-forge
     readline                  8.2                  h8228510_1    conda-forge
     requests                  2.31.0             pyhd8ed1ab_0    conda-forge
     rfc3339-validator         0.1.4              pyhd8ed1ab_0    conda-forge
     rfc3986-validator         0.1.1              pyh9f0ad1d_0    conda-forge
     rmm                       23.04.01        cuda11_py310_230421_geab50f46_0    rapidsai
     rtree                     1.0.1           py310hbdcdc62_1    conda-forge
     s2n                       1.3.44               h06160fa_0    conda-forge
     scikit-image              0.20.0          py310h9b08913_1    conda-forge
     scikit-learn              1.2.2           py310hf7d194e_2    conda-forge
     scipy                     1.10.1          py310ha4c1d20_3    conda-forge
     seaborn                   0.12.2               hd8ed1ab_0    conda-forge
     seaborn-base              0.12.2             pyhd8ed1ab_0    conda-forge
     send2trash                1.8.2              pyh41d4057_0    conda-forge
     setuptools                67.7.2             pyhd8ed1ab_0    conda-forge
     shapely                   2.0.1           py310h8b84c32_0    conda-forge
     simpervisor               1.0.0              pyhd8ed1ab_0    conda-forge
     six                       1.16.0             pyh6c4a22f_0    conda-forge
     snappy                    1.1.10               h9fff704_0    conda-forge
     sniffio                   1.3.0              pyhd8ed1ab_0    conda-forge
     sortedcontainers          2.4.0              pyhd8ed1ab_0    conda-forge
     soupsieve                 2.3.2.post1        pyhd8ed1ab_0    conda-forge
     spdlog                    1.11.0               h9b3ece8_1    conda-forge
     sqlite                    3.42.0               h2c6b66d_0    conda-forge
     stack_data                0.6.2              pyhd8ed1ab_0    conda-forge
     statsmodels               0.14.0          py310h278f3c1_1    conda-forge
     streamz                   0.6.4              pyh6c4a22f_0    conda-forge
     tblib                     1.7.0              pyhd8ed1ab_0    conda-forge
     terminado                 0.17.1             pyh41d4057_0    conda-forge
     threadpoolctl             3.1.0              pyh8a188c0_0    conda-forge
     tifffile                  2023.4.12          pyhd8ed1ab_0    conda-forge
     tiledb                    2.13.2               hd532e3d_0    conda-forge
     tinycss2                  1.2.1              pyhd8ed1ab_0    conda-forge
     tk                        8.6.12               h27826a3_0    conda-forge
     toolz                     0.12.0             pyhd8ed1ab_0    conda-forge
     tornado                   6.3.2           py310h2372a71_0    conda-forge
     tqdm                      4.65.0             pyhd8ed1ab_1    conda-forge
     traitlets                 5.9.0              pyhd8ed1ab_0    conda-forge
     treelite                  3.2.0           py310h1be96d9_0    conda-forge
     treelite-runtime          3.2.0                    pypi_0    pypi
     typing-extensions         4.6.3                hd8ed1ab_0    conda-forge
     typing_extensions         4.6.3              pyha770c72_0    conda-forge
     typing_utils              0.1.0              pyhd8ed1ab_0    conda-forge
     tzcode                    2023c                h0b41bf4_0    conda-forge
     tzdata                    2023c                h71feb2d_0    conda-forge
     ucx                       1.14.1               h4a2ce2d_2    conda-forge
     ucx-proc                  1.0.0                       gpu    rapidsai
     ucx-py                    0.31.01         py310_230421_g1e2fec4_0    rapidsai
     unicodedata2              15.0.0          py310h5764c6d_0    conda-forge
     urllib3                   1.26.15            pyhd8ed1ab_0    conda-forge
     wcwidth                   0.2.6              pyhd8ed1ab_0    conda-forge
     webencodings              0.5.1                      py_1    conda-forge
     websocket-client          1.5.3              pyhd8ed1ab_0    conda-forge
     websockets                10.4            py310h5764c6d_1    conda-forge
     wheel                     0.40.0             pyhd8ed1ab_0    conda-forge
     widgetsnbextension        4.0.7              pyhd8ed1ab_0    conda-forge
     xarray                    2023.5.0           pyhd8ed1ab_0    conda-forge
     xerces-c                  3.2.4                h55805fa_1    conda-forge
     xgboost                   1.7.5dev.rapidsai23.04 cuda_11_py310_0    rapidsai
     xorg-kbproto              1.0.7             h7f98852_1002    conda-forge
     xorg-libice               1.1.1                hd590300_0    conda-forge
     xorg-libsm                1.2.4                h7391055_0    conda-forge
     xorg-libx11               1.8.4                h0b41bf4_0    conda-forge
     xorg-libxau               1.0.11               hd590300_0    conda-forge
     xorg-libxdmcp             1.1.3                h7f98852_0    conda-forge
     xorg-libxext              1.3.4                h0b41bf4_2    conda-forge
     xorg-libxrender           0.9.10            h7f98852_1003    conda-forge
     xorg-renderproto          0.11.1            h7f98852_1002    conda-forge
     xorg-xextproto            7.3.0             h0b41bf4_1003    conda-forge
     xorg-xproto               7.0.31            h7f98852_1007    conda-forge
     xyzservices               2023.5.0           pyhd8ed1ab_1    conda-forge
     xz                        5.2.6                h166bdaf_0    conda-forge
     yaml                      0.2.5                h7f98852_2    conda-forge
     yarl                      1.9.2           py310h2372a71_0    conda-forge
     zeromq                    4.3.4                h9c3ff4c_1    conda-forge
     zfp                       1.0.0                h27087fc_3    conda-forge
     zict                      3.0.0              pyhd8ed1ab_0    conda-forge
     zipp                      3.15.0             pyhd8ed1ab_0    conda-forge
     zlib                      1.2.13               h166bdaf_4    conda-forge
     zlib-ng                   2.0.7                h0b41bf4_0    conda-forge
     zstd                      1.5.2                h3eb15da_6    conda-forge

bdice commented 1 year ago

@vuule If you have a moment, can you look into this? This error is being raised in the Python level: https://github.com/rapidsai/cudf/blob/deec3f8f981fd89f1aa46c6aea3714fd7c7355b9/python/cudf/cudf/_lib/csv.pyx#L347-L348

I think we need to verify if this error's stated restriction between byte range support and skipping rows exists at the C++ level. I took a brief glance over https://github.com/rapidsai/cudf/blob/branch-23.08/cpp/src/io/csv/csv_gpu.cu but didn't see any obvious limitations in the docs or verification in the implementation. I would prefer to raise the errors like this one in C++ if it is indeed a restriction of the API.

bdice commented 1 year ago

Ah, nevermind. I found there is an error raised here: https://github.com/rapidsai/cudf/blob/deec3f8f981fd89f1aa46c6aea3714fd7c7355b9/cpp/include/cudf/io/csv.hpp#L576

Perhaps we can avoid the duplicate error check between Python and C++. @vuule I'd defer to your expertise here on whether checking this at both layers is necessary.

vuule commented 1 year ago

I don't think the duplicate check is necessary, since there's no code path that does not hit the C++ level check. But I don't know if Python layer prefers to check ASAP instead of delegating to C++ (CC @galipremsagar ).

@stmio Can you use the header parameter to skip the invalid rows?

stmio commented 1 year ago

Hi @vuule, just tested it with the header parameter and it works with cudf, but not with dask_cudf:

Traceback (most recent call last):
  File "/home/sam/dask-test/main.py", line 3, in <module>
    data = dask_cudf.read_csv("./data.csv", header=3).set_index("A")
  File "/home/sam/miniconda3/envs/rapids-23.04/lib/python3.10/site-packages/dask_cudf/io/csv.py", line 90, in read_csv
    return _internal_read_csv(path=path, blocksize=blocksize, **kwargs)
  File "/home/sam/miniconda3/envs/rapids-23.04/lib/python3.10/site-packages/dask_cudf/io/csv.py", line 139, in _internal_read_csv
    meta = dask_reader(filenames[0], **kwargs1)._meta
  File "/home/sam/miniconda3/envs/rapids-23.04/lib/python3.10/site-packages/dask/dataframe/io/csv.py", line 755, in read
    return read_pandas(
  File "/home/sam/miniconda3/envs/rapids-23.04/lib/python3.10/site-packages/dask/dataframe/io/csv.py", line 618, in read_pandas
    header = b"" if header is None else parts[firstrow] + b_lineterminator
IndexError: list index out of range