rapidsai / cudf

cuDF - GPU DataFrame Library
https://docs.rapids.ai/api/cudf/stable/
Apache License 2.0
8.23k stars 883 forks source link

[BUG] Out of memory with jit_unspill=True #8640

Closed vidosits closed 3 years ago

vidosits commented 3 years ago

Describe the bug I think I shouldn't be able to get OOM error with jit_unspill=True with a conservative device_memory_limit

Steps/Code to reproduce bug Input is ~8 GB CSV / 843 MB parquet.

import dask_cudf
import cudf
import rmm
from dask_cuda import LocalCUDACluster
from dask.distributed import Client, wait
from dask.utils import parse_bytes
from xgboost import dask as dxgb

cluster = LocalCUDACluster(n_workers=2, device_memory_limit="5GB", jit_unspill=True)
client = Client(cluster)
# client.run(cudf.set_allocator, allocator="managed")

df = dask_cudf.read_parquet('modeling-data-subset-40-percent.parquet', npartitions=50)
X, y = df.drop('target', axis=1), df['target']

param = {'objective': 'mae',
         'eval_metric': ['rmse'],
         'tree_method': 'gpu_hist',
         'verbosity': 2}

num_boosting_rounds = 2

dtrain = dxgb.DaskDeviceQuantileDMatrix(client, X, y)
output = dxgb.train(client,
                        param,
                        dtrain,
                        num_boost_round=num_boosting_rounds)

Output with : device_memory_limit="5GB"

XGBoostError: scan failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered

or with device_memory_limit="9GB"

XGBoostError: [21:43:38] /opt/conda/envs/rapids/conda-bld/xgboost_1619020864980/work/src/c_api/../data/../common/device_helpers.cuh:414: Memory allocation error on worker 0: std::bad_alloc: CUDA error at: /home/jovyan/.conda/include/rmm/mr/device/cuda_memory_resource.hpp:69: cudaErrorMemoryAllocation out of memory
- Free memory: 3557097472
- Requested memory: 11814988752

Expected behavior Data spills to RAM. If device_memory_limit is specified and moderately low.

Environment overview (please complete the following information)

Environment details

Click here to see environment details

     **git***
     Not inside a git repository

     ***OS Information***
     DISTRIB_ID=Ubuntu
     DISTRIB_RELEASE=20.04
     DISTRIB_CODENAME=focal
     DISTRIB_DESCRIPTION="Ubuntu 20.04.2 LTS"
     NAME="Ubuntu"
     VERSION="20.04.2 LTS (Focal Fossa)"
     ID=ubuntu
     ID_LIKE=debian
     PRETTY_NAME="Ubuntu 20.04.2 LTS"
     VERSION_ID="20.04"
     HOME_URL="https://www.ubuntu.com/"
     SUPPORT_URL="https://help.ubuntu.com/"
     BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
     PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
     VERSION_CODENAME=focal
     UBUNTU_CODENAME=focal
     Linux jupyter 5.4.0-77-generic #86-Ubuntu SMP Thu Jun 17 02:35:03 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

     ***GPU Information***
     Thu Jul  1 21:46:15 2021
     +-----------------------------------------------------------------------------+
     | NVIDIA-SMI 460.73.01    Driver Version: 460.73.01    CUDA Version: 11.2     |
     |-------------------------------+----------------------+----------------------+
     | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
     | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
     |                               |                      |               MIG M. |
     |===============================+======================+======================|
     |   0  GeForce RTX 3080    Off  | 00000000:01:00.0 Off |                  N/A |
     |  0%   53C    P8    29W / 320W |   1046MiB / 10015MiB |      0%      Default |
     |                               |                      |                  N/A |
     +-------------------------------+----------------------+----------------------+
     |   1  GeForce RTX 3080    Off  | 00000000:21:00.0 Off |                  N/A |
     |  0%   50C    P8    22W / 320W |   6625MiB / 10018MiB |      0%      Default |
     |                               |                      |                  N/A |
     +-------------------------------+----------------------+----------------------+

     +-----------------------------------------------------------------------------+
     | Processes:                                                                  |
     |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
     |        ID   ID                                                   Usage      |
     |=============================================================================|
     +-----------------------------------------------------------------------------+

     ***CPU***
     Architecture:                    x86_64
     CPU op-mode(s):                  32-bit, 64-bit
     Byte Order:                      Little Endian
     Address sizes:                   43 bits physical, 48 bits virtual
     CPU(s):                          48
     On-line CPU(s) list:             0-47
     Thread(s) per core:              2
     Core(s) per socket:              24
     Socket(s):                       1
     NUMA node(s):                    1
     Vendor ID:                       AuthenticAMD
     CPU family:                      23
     Model:                           49
     Model name:                      AMD Ryzen Threadripper 3960X 24-Core Processor
     Stepping:                        0
     Frequency boost:                 enabled
     CPU MHz:                         2133.877
     CPU max MHz:                     3800.0000
     CPU min MHz:                     2200.0000
     BogoMIPS:                        7600.21
     Virtualization:                  AMD-V
     L1d cache:                       768 KiB
     L1i cache:                       768 KiB
     L2 cache:                        12 MiB
     L3 cache:                        128 MiB
     NUMA node0 CPU(s):               0-47
     Vulnerability Itlb multihit:     Not affected
     Vulnerability L1tf:              Not affected
     Vulnerability Mds:               Not affected
     Vulnerability Meltdown:          Not affected
     Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
     Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user pointer sanitization
     Vulnerability Spectre v2:        Mitigation; Full AMD retpoline, IBPB conditional, STIBP conditional, RSB filling
     Vulnerability Srbds:             Not affected
     Vulnerability Tsx async abort:   Not affected
     Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate sme ssbd mba sev ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif umip rdpid overflow_recov succor smca

     ***CMake***
     /usr/bin/cmake
     cmake version 3.16.3

     CMake suite maintained and supported by Kitware (kitware.com/cmake).

     ***g++***
     /usr/bin/g++
     g++ (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
     Copyright (C) 2019 Free Software Foundation, Inc.
     This is free software; see the source for copying conditions.  There is NO
     warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

     ***nvcc***
     /usr/local/cuda/bin/nvcc
     nvcc: NVIDIA (R) Cuda compiler driver
     Copyright (c) 2005-2021 NVIDIA Corporation
     Built on Sun_Feb_14_21:12:58_PST_2021
     Cuda compilation tools, release 11.2, V11.2.152
     Build cuda_11.2.r11.2/compiler.29618528_0

     ***Python***
     /home/jovyan/.conda/bin/python
     Python 3.7.9

     ***Environment Variables***
     PATH                            : /home/jovyan/.conda/bin:/home/jovyan/.conda/condabin:/home/jovyan/.conda/bin:/home/jovyan/.local/lightgbm/LightGBM:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
     LD_LIBRARY_PATH                 : /usr/local/nvidia/lib:/usr/local/nvidia/lib64
     NUMBAPRO_NVVM                   :
     NUMBAPRO_LIBDEVICE              :
     CONDA_PREFIX                    : /home/jovyan/.conda
     PYTHON_PATH                     :

     ***conda packages***
     /home/jovyan/.conda/bin/conda
     # packages in environment at /home/jovyan/.conda:
     #
     # Name                    Version                   Build  Channel
     _libgcc_mutex             0.1                 conda_forge    conda-forge
     _openmp_mutex             4.5                       1_gnu    conda-forge
     abseil-cpp                20210324.1           h9c3ff4c_0    conda-forge
     aiohttp                   3.7.4            py37h5e8e339_0    conda-forge
     albumentations            1.0.0              pyhd8ed1ab_0    conda-forge
     alembic                   1.4.1                      py_0    conda-forge
     alsa-lib                  1.2.3                h516909a_0    conda-forge
     altair                    4.1.0                      py_1    conda-forge
     annoy                     1.17.0                   pypi_0    pypi
     anyio                     3.1.0            py37h89c1867_0    conda-forge
     appdirs                   1.4.4              pyh9f0ad1d_0    conda-forge
     argon2-cffi               20.1.0           py37h8f50634_2    conda-forge
     arrow-cpp                 1.0.1           py37haa335b2_40_cuda    conda-forge
     arrow-cpp-proc            3.0.0                      cuda    conda-forge
     asn1crypto                1.4.0              pyh9f0ad1d_0    conda-forge
     async-timeout             3.0.1                   py_1000    conda-forge
     async_generator           1.10                       py_0    conda-forge
     atk-1.0                   2.36.0               h3371d22_4    conda-forge
     atpublic                  1.0                        py_0    conda-forge
     attrs                     21.2.0             pyhd8ed1ab_0    conda-forge
     autopep8                  1.5.7                    pypi_0    pypi
     aws-c-cal                 0.5.9                h3622835_0    conda-forge
     aws-c-common              0.5.11               h7f98852_0    conda-forge
     aws-c-event-stream        0.2.7                h8fbaa10_8    conda-forge
     aws-c-io                  0.10.1               h8007ed0_0    conda-forge
     aws-checksums             0.1.11               hc0e0e8b_6    conda-forge
     aws-sdk-cpp               1.8.186              h9ad65fb_2    conda-forge
     babel                     2.9.1              pyh44b312d_0    conda-forge
     backcall                  0.2.0              pyh9f0ad1d_0    conda-forge
     backoff                   1.10.0                     py_0    conda-forge
     backports                 1.0                        py_2    conda-forge
     backports.functools_lru_cache 1.6.4              pyhd8ed1ab_0    conda-forge
     bleach                    3.3.0              pyh44b312d_0    conda-forge
     blinker                   1.4                        py_1    conda-forge
     blosc                     1.21.0               h9c3ff4c_0    conda-forge
     bokeh                     2.2.3            py37hc8dfbb8_0    conda-forge
     boost                     1.72.0           py37h48f8a5e_1    conda-forge
     boost-cpp                 1.72.0               h9d3c048_4    conda-forge
     boto3                     1.17.93            pyhd8ed1ab_0    conda-forge
     botocore                  1.20.93            pyhd8ed1ab_0    conda-forge
     bottleneck                1.3.2            py37h902c9e0_3    conda-forge
     brotli                    1.0.9                h9c3ff4c_4    conda-forge
     brotlipy                  0.7.0           py37h27cfd23_1003
     brunsli                   0.1                  he1b5a44_0    conda-forge
     bzip2                     1.0.8                h7f98852_4    conda-forge
     c-ares                    1.17.1               h7f98852_1    conda-forge
     ca-certificates           2021.5.30            ha878542_0    conda-forge
     cached-property           1.5.2                hd8ed1ab_1    conda-forge
     cached_property           1.5.2              pyha770c72_1    conda-forge
     cachetools                4.2.2              pyhd8ed1ab_0    conda-forge
     cairo                     1.16.0            h6cf1ce9_1008    conda-forge
     catboost                  0.26             py37h89c1867_0    conda-forge
     category_encoders         2.2.2                      py_0    conda-forge
     certifi                   2021.5.30        py37h89c1867_0    conda-forge
     certipy                   0.1.3                      py_0    conda-forge
     cffi                      1.14.3           py37h261ae71_2
     cfitsio                   3.470                hb418390_7    conda-forge
     chardet                   3.0.4           py37h06a4308_1003
     charls                    2.2.0                h9c3ff4c_0    conda-forge
     click                     7.1.2              pyh9f0ad1d_0    conda-forge
     click-plugins             1.1.1                      py_0    conda-forge
     cliff                     3.8.0              pyhd8ed1ab_0    conda-forge
     cligj                     0.7.2              pyhd8ed1ab_0    conda-forge
     cloudpickle               1.6.0                      py_0    conda-forge
     cmaes                     0.8.2              pyh44b312d_0    conda-forge
     cmd2                      2.0.1            py37h89c1867_0    conda-forge
     colorama                  0.4.4              pyh9f0ad1d_0    conda-forge
     colorcet                  2.0.6              pyhd8ed1ab_0    conda-forge
     colorlog                  5.0.1            py37h89c1867_0    conda-forge
     commonmark                0.9.1                      py_0    conda-forge
     conda                     4.10.1           py37h89c1867_0    conda-forge
     conda-package-handling    1.7.2            py37h03888b9_0
     configobj                 5.0.6                      py_0    conda-forge
     configparser              5.0.2              pyhd8ed1ab_0    conda-forge
     configurable-http-proxy   4.4.0           node14_hfc12e6c_0    conda-forge
     confuse                   1.4.0              pyhd3deb0d_0    conda-forge
     cryptography              3.2.1            py37h3c74f83_1
     cudatoolkit               11.2.72              h2bc3f7f_0    nvidia
     cudf                      0.19.2          cuda_11.2_py37_gab3b3f653a_0    rapidsai
     cudf_kafka                0.19.2          py37_gab3b3f653a_0    rapidsai
     cudnn                     8.1.0.77             h90431f1_0    conda-forge
     cugraph                   0.19.0          py37_gd72b90b0_0    rapidsai
     cuml                      0.19.0          cuda11.2_py37_g4cb78ff1a_0    rapidsai
     cupy                      8.6.0            py37h3c5eebb_0    conda-forge
     curl                      7.77.0               hea6ffbf_0    conda-forge
     cusignal                  0.19.0          py38_g347b95a_0    rapidsai
     cuspatial                 0.19.0          py37_gdf1d93c_0    rapidsai
     custreamz                 0.19.2          py37_gab3b3f653a_0    rapidsai
     cutensor                  1.2.2.5              h96e36e3_4    conda-forge
     cuxfilter                 0.19.1          py37_g82dc80c_0    rapidsai
     cycler                    0.10.0                     py_2    conda-forge
     cyrus-sasl                2.1.27               h230043b_2    conda-forge
     cython                    0.29.23                  pypi_0    pypi
     cytoolz                   0.11.0           py37h5e8e339_3    conda-forge
     dask                      2021.4.0           pyhd8ed1ab_0    conda-forge
     dask-core                 2021.4.0           pyhd8ed1ab_0    conda-forge
     dask-cuda                 0.19.0                   py37_0    rapidsai
     dask-cudf                 0.19.2          py37_gab3b3f653a_0    rapidsai
     dask-glm                  0.2.0                    pypi_0    pypi
     dask-labextension         5.0.2                    pypi_0    pypi
     dask-ml                   1.9.0                    pypi_0    pypi
     databricks-cli            0.9.1                      py_0    conda-forge
     dataclasses               0.8                pyhc8e2a94_1    conda-forge
     datashader                0.11.1             pyh9f0ad1d_0    conda-forge
     datashape                 0.5.4                      py_1    conda-forge
     dbus                      1.13.18              hb2f20db_0
     decorator                 4.4.2                      py_0    conda-forge
     defusedxml                0.7.1              pyhd8ed1ab_0    conda-forge
     dictdiffer                0.8.1              pyhd8ed1ab_0    conda-forge
     diskcache                 5.2.1              pyh44b312d_0    conda-forge
     distributed               2021.4.0         py37h89c1867_0    conda-forge
     distro                    1.5.0              pyh9f0ad1d_0    conda-forge
     dlpack                    0.3                  he1b5a44_1    conda-forge
     docker-py                 5.0.0            py37h89c1867_0    conda-forge
     docker-pycreds            0.4.0                      py_0    conda-forge
     dpath                     2.0.1            py37h89c1867_0    conda-forge
     dulwich                   0.20.23          py37h5e8e339_0    conda-forge
     dvc                       2.1.0            py37h89c1867_0    conda-forge
     dvc-s3                    2.1.0            py37h89c1867_0    conda-forge
     entrypoints               0.3             py37hc8dfbb8_1002    conda-forge
     expat                     2.4.1                h9c3ff4c_0    conda-forge
     faiss-proc                1.0.0                      cuda    rapidsai
     fastavro                  1.4.1            py37h5e8e339_0    conda-forge
     fastrlock                 0.6              py37hcd2ae1e_0    conda-forge
     feather-format            0.4.1              pyh9f0ad1d_0    conda-forge
     ffmpeg                    4.3.1                hca11adc_2    conda-forge
     fiona                     1.8.20           py37ha0cc35a_0    conda-forge
     flask                     2.0.1              pyhd8ed1ab_0    conda-forge
     flatten-dict              0.3.0              pyh9f0ad1d_0    conda-forge
     flufl.lock                3.2                        py_0    conda-forge
     font-ttf-dejavu-sans-mono 2.37                 hab24e00_0    conda-forge
     font-ttf-inconsolata      3.000                h77eed37_0    conda-forge
     font-ttf-source-code-pro  2.038                h77eed37_0    conda-forge
     font-ttf-ubuntu           0.83                 hab24e00_0    conda-forge
     fontconfig                2.13.1            hba837de_1005    conda-forge
     fonts-conda-ecosystem     1                             0    conda-forge
     fonts-conda-forge         1                             0    conda-forge
     freetype                  2.10.4               h0708190_1    conda-forge
     freexl                    1.0.6                h7f98852_0    conda-forge
     fribidi                   1.0.10               h516909a_0    conda-forge
     fsspec                    0.9.0              pyhd8ed1ab_2    conda-forge
     ftfy                      6.0.3              pyhd8ed1ab_0    conda-forge
     funcy                     1.16               pyhd8ed1ab_0    conda-forge
     future                    0.18.2           py37h89c1867_3    conda-forge
     gdal                      3.2.2            py37hb0e9ad2_5    conda-forge
     gdk-pixbuf                2.42.6               h04a7f16_0    conda-forge
     geographiclib             1.50                     pypi_0    pypi
     geopandas                 0.8.1                      py_0    conda-forge
     geopy                     2.1.0                    pypi_0    pypi
     geos                      3.9.1                h9c3ff4c_2    conda-forge
     geotiff                   1.6.0                hcf90da6_5    conda-forge
     gettext                   0.21.0               hf68c758_0
     gflags                    2.2.2             he1b5a44_1004    conda-forge
     giflib                    5.2.1                h516909a_2    conda-forge
     gitdb                     4.0.7              pyhd8ed1ab_0    conda-forge
     gitpython                 3.1.17             pyhd8ed1ab_0    conda-forge
     glib                      2.68.2               h9c3ff4c_2    conda-forge
     glib-tools                2.68.2               h9c3ff4c_2    conda-forge
     glog                      0.5.0                h48cff8f_0    conda-forge
     gmp                       6.2.1                h58526e2_0    conda-forge
     gnutls                    3.6.15               he1e5248_0
     grandalf                  0.6                        py_0    conda-forge
     graphite2                 1.3.14               h23475e2_0
     graphviz                  2.47.2               h85b4f2f_0    conda-forge
     greenlet                  1.1.0            py37hcd2ae1e_0    conda-forge
     grpc-cpp                  1.38.0               h2519f57_0    conda-forge
     gst-plugins-base          1.18.4               hf529b03_2    conda-forge
     gstreamer                 1.18.4               h76c114f_2    conda-forge
     gtk2                      2.24.33              h539f30e_1    conda-forge
     gts                       0.7.6                h64030ff_2    conda-forge
     gunicorn                  20.1.0           py37h89c1867_0    conda-forge
     harfbuzz                  2.8.1                h83ec7ef_0    conda-forge
     hdbscan                   0.8.27           py37h902c9e0_0    conda-forge
     hdf4                      4.2.15               h10796ff_3    conda-forge
     hdf5                      1.10.6          nompi_h6a2412b_1114    conda-forge
     heapdict                  1.0.1                      py_0    conda-forge
     htmlmin                   0.1.12                     py_1    conda-forge
     hyperopt                  0.2.5              pyh9f0ad1d_0    conda-forge
     icu                       68.1                 h58526e2_0    conda-forge
     idna                      2.10                       py_0
     imagecodecs               2021.3.31        py37haf4b6ec_0    conda-forge
     imagehash                 4.2.0              pyhd8ed1ab_0    conda-forge
     imageio                   2.9.0                      py_0    conda-forge
     imgaug                    0.4.0                      py_1    conda-forge
     importlib-metadata        4.5.0            py37h89c1867_0    conda-forge
     importlib_metadata        4.5.0                hd8ed1ab_0    conda-forge
     ipykernel                 5.5.5            py37h085eea5_0    conda-forge
     ipython                   7.24.1           py37h085eea5_0    conda-forge
     ipython_genutils          0.2.0                      py_1    conda-forge
     ipywidgets                7.6.3              pyhd3deb0d_0    conda-forge
     itsdangerous              2.0.1              pyhd8ed1ab_0    conda-forge
     jasper                    1.900.1           h07fcdf6_1006    conda-forge
     jedi                      0.18.0           py37h89c1867_2    conda-forge
     jinja2                    3.0.1              pyhd8ed1ab_0    conda-forge
     jmespath                  0.10.0             pyh9f0ad1d_0    conda-forge
     joblib                    1.0.1              pyhd8ed1ab_0    conda-forge
     jpeg                      9d                   h516909a_0    conda-forge
     json-c                    0.15                 h98cffda_0    conda-forge
     json5                     0.9.5              pyh9f0ad1d_0    conda-forge
     jsonpath-ng               1.5.2              pyh9f0ad1d_0    conda-forge
     jsonschema                3.2.0            py37hc8dfbb8_1    conda-forge
     jupyter-server-mathjax    0.2.3                    pypi_0    pypi
     jupyter-server-proxy      3.0.2              pyhd8ed1ab_0    conda-forge
     jupyter_client            6.1.12             pyhd8ed1ab_0    conda-forge
     jupyter_core              4.7.1            py37h89c1867_0    conda-forge
     jupyter_server            1.8.0              pyhd8ed1ab_0    conda-forge
     jupyter_telemetry         0.1.0              pyhd8ed1ab_1    conda-forge
     jupyterhub                1.4.1            py37h89c1867_0    conda-forge
     jupyterhub-base           1.4.1            py37h89c1867_0    conda-forge
     jupyterlab                3.0.16             pyhd8ed1ab_0    conda-forge
     jupyterlab-geojson        3.1.2                    pypi_0    pypi
     jupyterlab-git            0.30.1                   pypi_0    pypi
     jupyterlab-notifications  0.2.0                    pypi_0    pypi
     jupyterlab-nvdashboard    0.6.0                    pypi_0    pypi
     jupyterlab_code_formatter 1.4.10             pyhd8ed1ab_1    conda-forge
     jupyterlab_pygments       0.1.2              pyh9f0ad1d_0    conda-forge
     jupyterlab_server         2.6.0              pyhd8ed1ab_0    conda-forge
     jupyterlab_widgets        1.0.0              pyhd8ed1ab_1    conda-forge
     jxrlib                    1.1                  h516909a_2    conda-forge
     kealib                    1.4.14               hcc255d8_2    conda-forge
     kiwisolver                1.3.1            py37h2527ec5_1    conda-forge
     krb5                      1.19.1               hcc1bbae_0    conda-forge
     lame                      3.100             h14c3975_1001    conda-forge
     lcms2                     2.12                 hddcbb42_0    conda-forge
     ld_impl_linux-64          2.33.1               h53a641e_7
     lerc                      2.2.1                h9c3ff4c_0    conda-forge
     libaec                    1.0.4                he1b5a44_1    conda-forge
     libarchive                3.5.1                h3f442fb_1    conda-forge
     libblas                   3.9.0                9_openblas    conda-forge
     libcblas                  3.9.0                9_openblas    conda-forge
     libclang                  11.1.0          default_ha53f305_1    conda-forge
     libcudf                   0.19.2          cuda11.2_gab3b3f653a_0    rapidsai
     libcudf_kafka             0.19.2            gab3b3f653a_0    rapidsai
     libcugraph                0.19.0          cuda11.2_gd72b90b0_0    rapidsai
     libcuml                   0.19.0          cuda11.2_g4cb78ff1a_0    rapidsai
     libcumlprims              0.19.0          cuda11.2_ga2abf9f_0    nvidia
     libcurl                   7.77.0               h2574ce0_0    conda-forge
     libcuspatial              0.19.0          cuda11.2_gdf1d93c_0    rapidsai
     libdap4                   3.20.6               hd7c4107_2    conda-forge
     libdeflate                1.7                  h7f98852_5    conda-forge
     libedit                   3.1.20191231         h14c3975_1
     libev                     4.33                 h516909a_1    conda-forge
     libevent                  2.1.10               hcdb4288_3    conda-forge
     libfaiss                  1.7.0           cuda112h5bea7ad_8_cuda    conda-forge
     libffi                    3.3                  he6710b0_2
     libgcc-ng                 9.3.0               h2828fa1_19    conda-forge
     libgcrypt                 1.9.3                h7f98852_0    conda-forge
     libgd                     2.3.2                h78a0170_0    conda-forge
     libgdal                   3.2.2                h679344c_5    conda-forge
     libgfortran-ng            9.3.0               hff62375_19    conda-forge
     libgfortran5              9.3.0               hff62375_19    conda-forge
     libgit2                   1.1.0                h3974521_1    conda-forge
     libglib                   2.68.2               h3e27bee_2    conda-forge
     libgomp                   9.3.0               h2828fa1_19    conda-forge
     libgpg-error              1.42                 h9c3ff4c_0    conda-forge
     libgsasl                  1.8.0                         2    conda-forge
     libhwloc                  2.3.0                h5e5b7d1_1    conda-forge
     libiconv                  1.16                 h516909a_0    conda-forge
     libidn2                   2.3.1                h7f98852_0    conda-forge
     libkml                    1.3.0             hd79254b_1012    conda-forge
     liblapack                 3.9.0                9_openblas    conda-forge
     liblapacke                3.9.0                9_openblas    conda-forge
     libllvm10                 10.0.1               he513fc3_3    conda-forge
     libllvm11                 11.1.0               hf817b99_2    conda-forge
     libnetcdf                 4.8.0           nompi_hcd642e3_103    conda-forge
     libnghttp2                1.43.0               h812cca2_0    conda-forge
     libntlm                   1.5                  h7b6447c_0
     libogg                    1.3.5                h27cfd23_1
     libopenblas               0.3.15          pthreads_h8fe5266_1    conda-forge
     libopencv                 4.5.1            py37h90094e2_0    conda-forge
     libopus                   1.3.1                h7f98852_1    conda-forge
     libpng                    1.6.37               hed695b0_2    conda-forge
     libpq                     13.3                 hd57d9b9_0    conda-forge
     libprotobuf               3.16.0               h780b84a_0    conda-forge
     librdkafka                1.5.3                hc49e61c_1    conda-forge
     librmm                    0.19.0          cuda11.2_g7065af3_0    rapidsai
     librsvg                   2.50.7               hc3c00ef_0    conda-forge
     librttopo                 1.1.0                h1185371_6    conda-forge
     libsodium                 1.0.18               h516909a_1    conda-forge
     libsolv                   0.7.18               h780b84a_0    conda-forge
     libspatialindex           1.9.3                he1b5a44_3    conda-forge
     libspatialite             5.0.1                h20cb978_4    conda-forge
     libssh2                   1.9.0                ha56f1ee_6    conda-forge
     libstdcxx-ng              9.3.0               h6de172a_19    conda-forge
     libtasn1                  4.16.0               h27cfd23_0
     libthrift                 0.14.1               he6d91bd_1    conda-forge
     libtiff                   4.2.0                hbd63e13_2    conda-forge
     libtool                   2.4.6             h58526e2_1007    conda-forge
     libunistring              0.9.10               h14c3975_0    conda-forge
     libutf8proc               2.6.1                h7f98852_0    conda-forge
     libuuid                   2.32.1            h14c3975_1000    conda-forge
     libuv                     1.41.0               h7f98852_0    conda-forge
     libvorbis                 1.3.7                he1b5a44_0    conda-forge
     libwebp                   1.2.0                h3452ae3_0    conda-forge
     libwebp-base              1.2.0                h7f98852_2    conda-forge
     libxcb                    1.14                 h7b6447c_0
     libxgboost                1.4.0dev.rapidsai0.19      cuda11.2_0    rapidsai
     libxkbcommon              1.0.3                he3ba5ed_0    conda-forge
     libxml2                   2.9.12               h72842e0_0    conda-forge
     libzip                    1.7.3                he9f05b3_0    conda-forge
     libzopfli                 1.0.3                he1b5a44_0    conda-forge
     lightgbm                  3.2.1.99                  dev_0    
     llvmlite                  0.36.0           py37h9d7f4d0_0    conda-forge
     locket                    0.2.1            py37h06a4308_1
     lz4-c                     1.9.3                h9c3ff4c_0    conda-forge
     lzo                       2.10              h516909a_1000    conda-forge
     mailchecker               4.0.8              pyhd8ed1ab_0    conda-forge
     mako                      1.1.4              pyh44b312d_0    conda-forge
     mamba                     0.13.0           py37h7f483ca_0    conda-forge
     mamba_gator               5.1.1              pyhd8ed1ab_0    conda-forge
     markdown                  3.3.4              pyhd8ed1ab_0    conda-forge
     markupsafe                2.0.1            py37h5e8e339_0    conda-forge
     matplotlib-base           3.4.2            py37hdd32ed1_0    conda-forge
     matplotlib-inline         0.1.2              pyhd8ed1ab_2    conda-forge
     missingno                 0.4.2                      py_1    conda-forge
     mistune                   0.8.4           py37h5e8e339_1003    conda-forge
     mlflow                    1.17.0           py37h02d9ccd_1    conda-forge
     modin                     0.10.0                   pypi_0    pypi
     msgpack-python            1.0.2            py37h2527ec5_1    conda-forge
     multidict                 5.1.0            py37h5e8e339_1    conda-forge
     multimethod               1.4                        py_0    conda-forge
     multipledispatch          0.6.0                      py_0    conda-forge
     munch                     2.5.0                      py_0    conda-forge
     mysql-common              8.0.25               ha770c72_0    conda-forge
     mysql-libs                8.0.25               h935591d_0    conda-forge
     nanotime                  0.5.2                      py_0    conda-forge
     nbclassic                 0.3.1              pyhd8ed1ab_1    conda-forge
     nbclient                  0.5.3              pyhd8ed1ab_0    conda-forge
     nbconvert                 6.0.7            py37h89c1867_3    conda-forge
     nbdime                    3.1.0                    pypi_0    pypi
     nbformat                  5.1.3              pyhd8ed1ab_0    conda-forge
     nccl                      2.9.9.1              hdc17891_0    conda-forge
     ncurses                   6.2                  he6710b0_1
     nest-asyncio              1.5.1              pyhd8ed1ab_0    conda-forge
     nettle                    3.7.3                hbbd107a_1
     networkx                  2.5.1              pyhd8ed1ab_0    conda-forge
     nodejs                    14.15.4              h92b4a50_1    conda-forge
     notebook                  6.4.0              pyha770c72_0    conda-forge
     nspr                      4.30                 h9c3ff4c_0    conda-forge
     nss                       3.67                 hb5efdd6_0    conda-forge
     numba                     0.53.1           py37h134767a_0    conda-forge
     numpy                     1.20.3           py37h038b26d_1    conda-forge
     nvtabular                 0.5.3                         0    nvidia
     nvtx                      0.2.3            py37h5e8e339_0    conda-forge
     oauthenticator            14.0.0             pyhd8ed1ab_0    conda-forge
     oauthlib                  3.1.1              pyhd8ed1ab_0    conda-forge
     olefile                   0.46               pyh9f0ad1d_1    conda-forge
     opencv                    4.5.1            py37h89c1867_0    conda-forge
     opencv-python             4.5.2.54                 pypi_0    pypi
     openh264                  2.1.1                h780b84a_0    conda-forge
     openjpeg                  2.4.0                hb52868f_1    conda-forge
     openssl                   1.1.1k               h7f98852_0    conda-forge
     optuna                    2.8.0              pyhd8ed1ab_0    conda-forge
     orc                       1.6.7                h89a63ab_2    conda-forge
     packaging                 20.9               pyh44b312d_0    conda-forge
     pacmap                    0.4                      pypi_0    pypi
     pamela                    1.0.0                      py_0    conda-forge
     pandas                    1.2.4            py37h219a48f_0    conda-forge
     pandas-profiling          3.0.0              pyhd8ed1ab_0    conda-forge
     pandoc                    2.14.0.1             h7f98852_0    conda-forge
     pandocfilters             1.4.3            py37h06a4308_1
     panel                     0.10.3             pyhd8ed1ab_0    conda-forge
     pango                     1.48.5               hb8ff022_0    conda-forge
     param                     1.10.1             pyhd3deb0d_0    conda-forge
     parquet-cpp               1.5.1                         1    conda-forge
     parso                     0.8.2              pyhd8ed1ab_0    conda-forge
     partd                     1.2.0              pyhd8ed1ab_0    conda-forge
     pathlib2                  2.3.5            py37h89c1867_3    conda-forge
     pathspec                  0.8.1              pyhd3deb0d_0    conda-forge
     patsy                     0.5.1                      py_0    conda-forge
     pbr                       5.6.0              pyhd8ed1ab_0    conda-forge
     pcre                      8.44                 he1b5a44_0    conda-forge
     pcre2                     10.36                h032f7d1_1    conda-forge
     pexpect                   4.8.0            py37hc8dfbb8_1    conda-forge
     phik                      0.11.2             pyhd8ed1ab_0    conda-forge
     phonenumbers              8.12.24            pyhd8ed1ab_1    conda-forge
     pickle5                   0.0.11           py37h8f50634_0    conda-forge
     pickleshare               0.7.5           py37hc8dfbb8_1002    conda-forge
     pillow                    8.2.0            py37h4600e1f_1    conda-forge
     pip                       20.2.4           py37h06a4308_0
     pixman                    0.40.0               h36c2ea0_0    conda-forge
     ply                       3.11                       py_1    conda-forge
     pooch                     1.4.0              pyhd8ed1ab_0    conda-forge
     poppler                   21.03.0              h93df280_0    conda-forge
     poppler-data              0.4.10                        0    conda-forge
     postgresql                13.3                 h2510834_0    conda-forge
     prettytable               2.1.0              pyhd8ed1ab_0    conda-forge
     proj                      8.0.0                h277dcde_0    conda-forge
     prometheus_client         0.11.0             pyhd8ed1ab_0    conda-forge
     prometheus_flask_exporter 0.18.2             pyhd8ed1ab_0    conda-forge
     prompt-toolkit            3.0.18             pyha770c72_0    conda-forge
     protobuf                  3.16.0           py37hcd2ae1e_0    conda-forge
     psutil                    5.8.0            py37h5e8e339_1    conda-forge
     ptyprocess                0.7.0              pyhd3deb0d_0    conda-forge
     py-opencv                 4.5.1            py37h888b3d9_0    conda-forge
     py-xgboost                1.4.0dev.rapidsai0.19  cuda11.2py37_0    rapidsai
     pyarrow                   1.0.1           py37hb63ea2f_40_cuda    conda-forge
     pyasn1                    0.4.8                      py_0    conda-forge
     pycodestyle               2.7.0                    pypi_0    pypi
     pycosat                   0.6.3            py37h27cfd23_0
     pycparser                 2.20                       py_2
     pyct                      0.4.8                    py37_0
     pycurl                    7.43.0.6         py37h88a64d2_1    conda-forge
     pydantic                  1.8.2            py37h5e8e339_0    conda-forge
     pydeck                    0.5.0              pyh9f0ad1d_0    conda-forge
     pydicom                   2.1.2              pyhd3deb0d_0    conda-forge
     pydot                     1.4.2            py37h89c1867_0    conda-forge
     pyee                      7.0.4              pyh9f0ad1d_0    conda-forge
     pygit2                    1.6.0            py37h5e8e339_0    conda-forge
     pygments                  2.9.0              pyhd8ed1ab_0    conda-forge
     pygtrie                   2.3.2              pyh8c360ce_0    conda-forge
     pyjwt                     2.0.1              pyhd8ed1ab_0    conda-forge
     pymongo                   3.11.4           py37hcd2ae1e_0    conda-forge
     pynndescent               0.5.2              pyh44b312d_0    conda-forge
     pynvml                    11.0.0             pyhd8ed1ab_0    conda-forge
     pyodbc                    4.0.30           py37hcd2ae1e_1    conda-forge
     pyopenssl                 19.1.0             pyhd3eb1b0_1
     pyparsing                 2.4.7              pyh9f0ad1d_0    conda-forge
     pyperclip                 1.8.2              pyhd8ed1ab_2    conda-forge
     pyppeteer                 0.2.2                      py_1    conda-forge
     pyproj                    3.0.1            py37h2bb2a07_1    conda-forge
     pyrsistent                0.17.3           py37h5e8e339_2    conda-forge
     pysocks                   1.7.1                    py37_1
     python                    3.7.9                h7579374_0
     python-benedict           0.24.0             pyhd8ed1ab_0    conda-forge
     python-confluent-kafka    1.5.0            py37h8f50634_0    conda-forge
     python-dateutil           2.8.1                      py_0    conda-forge
     python-editor             1.0.4                      py_0    conda-forge
     python-fsutil             0.5.0              pyhd8ed1ab_0    conda-forge
     python-json-logger        2.0.1              pyh9f0ad1d_0    conda-forge
     python-slugify            5.0.2              pyhd8ed1ab_0    conda-forge
     python_abi                3.7                     1_cp37m    conda-forge
     pytz                      2021.1             pyhd8ed1ab_0    conda-forge
     pyviz_comms               2.0.1              pyhd3deb0d_0    conda-forge
     pywavelets                1.1.1            py37h161383b_3    conda-forge
     pyyaml                    5.4.1            py37h5e8e339_0    conda-forge
     pyzmq                     22.1.0           py37h336d617_0    conda-forge
     qgrid                     1.3.1            py37h89c1867_2    conda-forge
     qt                        5.12.9               hda022c4_4    conda-forge
     querystring_parser        1.2.4                      py_0    conda-forge
     rapids                    0.19.0          cuda11.2_py37_ga03647d_299    rapidsai
     rapids-xgboost            0.19.0          cuda11.2_py37_ga03647d_299    rapidsai
     ratelimit                 2.2.1              pyhd8ed1ab_0    conda-forge
     re2                       2021.04.01           h9c3ff4c_0    conda-forge
     readline                  8.1                  h46c0cb4_0    conda-forge
     reproc                    14.2.1               h36c2ea0_0    conda-forge
     reproc-cpp                14.2.1               h58526e2_0    conda-forge
     requests                  2.24.0                     py_0
     rich                      10.3.0           py37h89c1867_0    conda-forge
     rmm                       0.19.0          cuda_11.2_py37_g7065af3_0    rapidsai
     rtree                     0.9.7            py37h0b55af0_1    conda-forge
     ruamel.yaml               0.17.7           py37h5e8e339_0    conda-forge
     ruamel.yaml.clib          0.2.2            py37h5e8e339_2    conda-forge
     ruamel_yaml               0.15.87          py37h7b6447c_1
     s2n                       1.0.9                h9b69904_0    conda-forge
     s3transfer                0.4.2              pyhd8ed1ab_0    conda-forge
     scikit-image              0.18.1           py37hdc94413_0    conda-forge
     scikit-learn              0.24.2           py37h18a542f_0    conda-forge
     scikit-plot               0.3.7                    pypi_0    pypi
     scipy                     1.6.3            py37h29e03ee_0    conda-forge
     seaborn                   0.11.1               ha770c72_0    conda-forge
     seaborn-base              0.11.1             pyhd8ed1ab_1    conda-forge
     send2trash                1.5.0                      py_0    conda-forge
     setuptools                49.6.0           py37h89c1867_3    conda-forge
     shapely                   1.7.1            py37h2d1e849_5    conda-forge
     shortuuid                 1.0.1            py37h89c1867_4    conda-forge
     shtab                     1.3.6              pyhd8ed1ab_0    conda-forge
     simpervisor               0.4                pyhd8ed1ab_0    conda-forge
     six                       1.15.0           py37h06a4308_0
     smmap                     4.0.0                    pypi_0    pypi
     snappy                    1.1.8                he1b5a44_3    conda-forge
     sniffio                   1.2.0            py37h89c1867_1    conda-forge
     sortedcontainers          2.4.0              pyhd8ed1ab_0    conda-forge
     spdlog                    1.7.0                hc9558a2_2    conda-forge
     sqlalchemy                1.4.17           py37h5e8e339_0    conda-forge
     sqlite                    3.35.5               h74cdb3f_0    conda-forge
     sqlparse                  0.4.1              pyh9f0ad1d_0    conda-forge
     statsmodels               0.12.2           py37h902c9e0_0    conda-forge
     stevedore                 3.3.0            py37h89c1867_1    conda-forge
     streamz                   0.6.2              pyh44b312d_0    conda-forge
     tabulate                  0.8.9              pyhd8ed1ab_0    conda-forge
     tangled-up-in-unicode     0.1.0              pyhd8ed1ab_0    conda-forge
     tbb                       2020.3               hfd86e86_0
     tblib                     1.7.0              pyhd8ed1ab_0    conda-forge
     terminado                 0.10.0           py37h89c1867_0    conda-forge
     testpath                  0.5.0              pyhd8ed1ab_0    conda-forge
     text-unidecode            1.3                        py_0    conda-forge
     theme-darcula             3.0.0                    pypi_0    pypi
     threadpoolctl             2.1.0              pyh5ca1d4c_0    conda-forge
     tifffile                  2021.4.8           pyhd8ed1ab_0    conda-forge
     tiledb                    2.2.9                h91fcb0e_0    conda-forge
     tk                        8.6.10               hbc83047_0
     toml                      0.10.2             pyhd8ed1ab_0    conda-forge
     toolz                     0.11.1                     py_0    conda-forge
     tornado                   6.1              py37h5e8e339_1    conda-forge
     tqdm                      4.51.0             pyhd3eb1b0_0
     traitlets                 5.0.5                      py_0    conda-forge
     treelite                  1.1.0            py37hfdac9b6_0    conda-forge
     treelite-runtime          1.1.0                    pypi_0    pypi
     trimap                    1.0.15                   pypi_0    pypi
     typing-extensions         3.10.0.0             hd8ed1ab_0    conda-forge
     typing_extensions         3.10.0.0           pyha770c72_0    conda-forge
     tzcode                    2021a                h7f98852_1    conda-forge
     tzdata                    2021a                he74cb21_0    conda-forge
     ucx                       1.9.0+gcd9efd3       cuda11.2_0    rapidsai
     ucx-proc                  1.0.0                       gpu    rapidsai
     ucx-py                    0.19.0          py37_gcd9efd3_0    rapidsai
     umap-learn                0.5.1            py37h89c1867_1    conda-forge
     unidecode                 1.2.0              pyhd8ed1ab_0    conda-forge
     unixodbc                  2.3.9                hb166930_0    conda-forge
     urllib3                   1.25.11                    py_0
     visions                   0.7.1              pyhd8ed1ab_0    conda-forge
     voluptuous                0.12.1             pyhd3deb0d_0    conda-forge
     wcwidth                   0.2.5              pyh9f0ad1d_2    conda-forge
     webencodings              0.5.1                      py_1    conda-forge
     websocket-client          0.58.0           py37h06a4308_4
     websockets                8.1              py37h5e8e339_3    conda-forge
     werkzeug                  2.0.1              pyhd8ed1ab_0    conda-forge
     wheel                     0.35.1             pyhd3eb1b0_0
     widgetsnbextension        3.5.1            py37hc8dfbb8_4    conda-forge
     x264                      1!161.3030           h7f98852_0    conda-forge
     xarray                    0.18.2             pyhd8ed1ab_0    conda-forge
     xerces-c                  3.2.3                h9d8b166_2    conda-forge
     xgboost                   1.4.0dev.rapidsai0.19  cuda11.2py37_0    rapidsai
     xmltodict                 0.12.0                     py_0    conda-forge
     xorg-kbproto              1.0.7             h14c3975_1002    conda-forge
     xorg-libice               1.0.10               h516909a_0    conda-forge
     xorg-libsm                1.2.3             hd9c2040_1000    conda-forge
     xorg-libx11               1.7.2                h7f98852_0    conda-forge
     xorg-libxext              1.3.4                h7f98852_1    conda-forge
     xorg-libxrender           0.9.10            h7f98852_1003    conda-forge
     xorg-renderproto          0.11.1            h14c3975_1002    conda-forge
     xorg-xextproto            7.3.0             h14c3975_1002    conda-forge
     xorg-xproto               7.0.31            h14c3975_1007    conda-forge
     xz                        5.2.5                h7b6447c_0
     yaml                      0.2.5                h7b6447c_0
     yarl                      1.6.3            py37h5e8e339_1    conda-forge
     zc.lockfile               2.0                        py_0    conda-forge
     zeromq                    4.3.4                h9c3ff4c_0    conda-forge
     zfp                       0.5.5                h9c3ff4c_5    conda-forge
     zict                      2.0.0                      py_0    conda-forge
     zipp                      3.4.1              pyhd8ed1ab_0    conda-forge
     zlib                      1.2.11               h7b6447c_3
     zstd                      1.4.9                ha95c52a_0    conda-forge

Additional context Add any other context about the problem here.

beckernick commented 3 years ago

If the algorithms have fundamental in-memory requirements for certain operations, you may not be able to succeed with spilling alone. It looks like you have a total of 20 GB of memory across two GPUs. Perhaps @trivialfis might have some insight into whether loading an 8GB+ sized file might push the limit with Dask XGBoost on this setup.

vidosits commented 3 years ago

Wouldn't jit_unspill=True solve that though? Wouldn't the excess just spill to RAM? Or do you mean that it can't because for some fundamental (for XGBoost) operation I'd need more than 20 GBs of GPU RAM?

vidosits commented 3 years ago

The real problem was #5012. Since my data only had 1 partition there was no way to distribute it across devices. Converting to csv first and setting npartitions solved the problem.

trivialfis commented 3 years ago

Happy to see it resolved!