[BUG] dask_cudf - aggregate - to_csv memory error

Describe the bug A clear and concise description of what the bug is. I am loading a large dataframe (~60M x 300) by csv via dask_cudf, then looking to do a groupby and sum, and resave this to csv. I get an OOM error - I am using an A100-80GB gpu along with 200GB of RAM.

All rows are numerical values, besides the groupby row left as the index. Thus, this error should be reproducible via a random dataframe. I noted a similar issue @10426, however this error message is different, therefore I was unsure if this was the case. Additionally, I do repeatedly get a high cpu garbage collection message, however I assume that is because of the size of the dataframe and many read/writes, correct me if that is not the case. Steps/Code to reproduce bug Follow this guide http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports to craft a minimal bug report. This helps us reproduce the issue you're having and resolve the issue more quickly.

import numpy as np
import pandas as pd
import cudf
import cupy
from dask_cuda import LocalCUDACluster
from dask.distributed import Client
from dask.utils import parse_bytes
import dask_cudf

cluster = LocalCUDACluster(jit_unspill=True,
                           rmm_pool_size=parse_bytes("64 GB"),
                           n_workers = 1,
                           device_memory_limit=parse_bytes("160 GB"),
                           local_directory='local_temp',
                           threads_per_worker=32)
client = Client(cluster)

df = dask_cudf.read_csv('../02_all_study/02_tad_80_cluster_ref.tsv',sep = '\t')
df2 = df.drop('Contig',axis=1)
res = df2.groupby('ref90_cluster').sum()
res.to_csv('04_cluster_groups_csv')

Output (I think the error message is repeating after nanny restarts, but I have included the entire error message for thoroughness (attached as file for size): dask_to_csv_error.txt

Expected behavior A clear and concise description of what you expected to happen.

Environment overview (please complete the following information)

Environment location: [Bare-metal, Docker, Cloud(specify cloud provider)] RHEL server
Method of cuDF install: [conda, Docker, or from source] conda (mamba)
- If method of install is [Docker], provide docker pull & docker run commands used

Environment details Please run and paste the output of the cudf/print_env.sh script here, to gather any other relevant environment details

Click here to see environment details


     **git***
     Not inside a git repository

     ***OS Information***
     NAME="Red Hat Enterprise Linux Server"
     VERSION="7.9 (Maipo)"
     ID="rhel"
     ID_LIKE="fedora"
     VARIANT="Server"
     VARIANT_ID="server"
     VERSION_ID="7.9"
     PRETTY_NAME="Red Hat Enterprise Linux Server 7.9 (Maipo)"
     ANSI_COLOR="0;31"
     CPE_NAME="cpe:/o:redhat:enterprise_linux:7.9:GA:server"
     HOME_URL="https://www.redhat.com/"
     BUG_REPORT_URL="https://bugzilla.redhat.com/"

     REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 7"
     REDHAT_BUGZILLA_PRODUCT_VERSION=7.9
     REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux"
     REDHAT_SUPPORT_PRODUCT_VERSION="7.9"
     Red Hat Enterprise Linux Server release 7.9 (Maipo)
     Red Hat Enterprise Linux Server release 7.9 (Maipo)
     Linux atl1-1-01-006-7-0.pace.gatech.edu 3.10.0-1160.49.1.el7.x86_64 #1 SMP Tue Nov 9 16:09:48 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

     ***GPU Information***
     Tue Apr 25 18:48:17 2023
     +-----------------------------------------------------------------------------+
     | NVIDIA-SMI 515.65.01    Driver Version: 515.65.01    CUDA Version: 11.7     |
     |-------------------------------+----------------------+----------------------+
     | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
     | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
     |                               |                      |               MIG M. |
     |===============================+======================+======================|
     |   0  NVIDIA A100 80G...  On   | 00000000:25:00.0 Off |                    0 |
     | N/A   33C    P0    61W / 300W |  72218MiB / 81920MiB |      0%      Default |
     |                               |                      |             Disabled |
     +-------------------------------+----------------------+----------------------+

     +-----------------------------------------------------------------------------+
     | Processes:                                                                  |
     |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
     |        ID   ID                                                   Usage      |
     |=============================================================================|
     |    0   N/A  N/A     26580      C   ...s/rapids-23.04/bin/python    10315MiB |
     |    0   N/A  N/A     27149      C   ...s/rapids-23.04/bin/python    61901MiB |
     +-----------------------------------------------------------------------------+

     ***CPU***
     Architecture:          x86_64
     CPU op-mode(s):        32-bit, 64-bit
     Byte Order:            Little Endian
     CPU(s):                64
     On-line CPU(s) list:   0-63
     Thread(s) per core:    1
     Core(s) per socket:    32
     Socket(s):             2
     NUMA node(s):          8
     Vendor ID:             AuthenticAMD
     CPU family:            25
     Model:                 1
     Model name:            AMD EPYC 7513 32-Core Processor
     Stepping:              1
     CPU MHz:               2600.000
     CPU max MHz:           2600.0000
     CPU min MHz:           1500.0000
     BogoMIPS:              5200.16
     Virtualization:        AMD-V
     L1d cache:             32K
     L1i cache:             32K
     L2 cache:              512K
     L3 cache:              32768K
     NUMA node0 CPU(s):     0-7
     NUMA node1 CPU(s):     8-15
     NUMA node2 CPU(s):     16-23
     NUMA node3 CPU(s):     24-31
     NUMA node4 CPU(s):     32-39
     NUMA node5 CPU(s):     40-47
     NUMA node6 CPU(s):     48-55
     NUMA node7 CPU(s):     56-63
     Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc art rep_good nopl nonstop_tsc extd_apicid aperfmperf eagerfpu pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_l2 cpb cat_l3 cdp_l3 invpcid_single hw_pstate sme retpoline_amd ssbd ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 invpcid cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold v_vmsave_vmload vgif umip pku ospke vaes vpclmulqdq overflow_recov succor smca

     ***CMake***
     /bin/cmake
     cmake version 2.8.12.2

     ***g++***
     /usr/lib64/ccache/g++
     g++ (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44)
     Copyright (C) 2015 Free Software Foundation, Inc.
     This is free software; see the source for copying conditions.  There is NO
     warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

     ***nvcc***
     /usr/local/pace-apps/manual/packages/nvhpc/Linux_x86_64/22.11/compilers/bin/nvcc
     nvcc: NVIDIA (R) Cuda compiler driver
     Copyright (c) 2005-2022 NVIDIA Corporation
     Built on Tue_May__3_18:49:52_PDT_2022
     Cuda compilation tools, release 11.7, V11.7.64
     Build cuda_11.7.r11.7/compiler.31294372_0

     ***Python***
     /storage/home/hcoda1/6/rridley3/data/dir/anaconda3/envs/rapids-23.04/bin/python
     Python 3.10.10

     ***Environment Variables***
     PATH                            : /usr/local/pace-apps/manual/packages/nvhpc/Linux_x86_64/22.11/compilers/extras/qd/bin:/usr/local/pace-apps/manual/packages/nvhpc/Linux_x86_64/22.11/comm_libs/mpi/bin:/usr/local/pace-apps/manual/packages/nvhpc/Linux_x86_64/22.11/compilers/bin:/usr/local/pace-apps/manual/packages/nvhpc/Linux_x86_64/22.11/cuda/bin:/usr/local/pace-apps/spack/packages/linux-rhel7-x86_64/gcc-4.8.5/cuda-11.7.0-7sdye3id7ahz34mzhyzzqbxowjxgxkhu/bin:/storage/home/hcoda1/6/rridley3/.cargo/bin:/storage/home/hcoda1/6/rridley3/data/dir/anaconda3/envs/rapids-23.04/bin:/storage/home/hcoda1/6/rridley3/data/dir/apps:/storage/home/hcoda1/6/rridley3/.aspera/connect/bin:/opt/pace-common/bin:/opt/slurm/current/bin:/opt/pace-system/bin:/usr/lpp/mmfs/bin:/usr/lib64/ccache:/sbin:/bin:/usr/sbin:/usr/bin:/opt/iozone/bin:/storage/home/hcoda1/6/rridley3/edirect
     LD_LIBRARY_PATH                 : /usr/local/pace-apps/manual/packages/nvhpc/Linux_x86_64/22.11/comm_libs/nvshmem/lib:/usr/local/pace-apps/manual/packages/nvhpc/Linux_x86_64/22.11/comm_libs/nccl/lib:/usr/local/pace-apps/manual/packages/nvhpc/Linux_x86_64/22.11/comm_libs/mpi/lib:/usr/local/pace-apps/manual/packages/nvhpc/Linux_x86_64/22.11/math_libs/lib64:/usr/local/pace-apps/manual/packages/nvhpc/Linux_x86_64/22.11/compilers/lib:/usr/local/pace-apps/manual/packages/nvhpc/Linux_x86_64/22.11/compilers/extras/qd/lib:/usr/local/pace-apps/manual/packages/nvhpc/Linux_x86_64/22.11/cuda/extras/CUPTI/lib64:/usr/local/pace-apps/manual/packages/nvhpc/Linux_x86_64/22.11/cuda/lib64:/usr/local/pace-apps/spack/packages/linux-rhel7-x86_64/gcc-4.8.5/cuda-11.7.0-7sdye3id7ahz34mzhyzzqbxowjxgxkhu/lib64:/opt/slurm/current/lib::
     NUMBAPRO_NVVM                   :
     NUMBAPRO_LIBDEVICE              :
     CONDA_PREFIX                    : /storage/home/hcoda1/6/rridley3/data/dir/anaconda3/envs/rapids-23.04
     PYTHON_PATH                     :

     conda not found
     ***pip packages***
     /storage/home/hcoda1/6/rridley3/data/dir/anaconda3/envs/rapids-23.04/bin/pip
     Package                       Version
     ----------------------------- -----------
     aiofiles                      22.1.0
     aiohttp                       3.8.4
     aiosignal                     1.3.1
     aiosqlite                     0.18.0
     anyio                         3.6.2
     aplus                         0.11.0
     appdirs                       1.4.4
     argon2-cffi                   21.3.0
     argon2-cffi-bindings          21.2.0
     arrow                         1.2.3
     asciitree                     0.3.3
     astropy                       5.2.2
     asttokens                     2.2.1
     async-timeout                 4.0.2
     attrs                         22.2.0
     Babel                         2.12.1
     backcall                      0.2.0
     backports.functools-lru-cache 1.6.4
     beautifulsoup4                4.12.2
     blake3                        0.2.1
     bleach                        6.0.0
     bokeh                         2.4.3
     bqplot                        0.12.39
     branca                        0.6.0
     brotlipy                      0.7.0
     cached-property               1.5.2
     cachetools                    5.3.0
     certifi                       2022.12.7
     cffi                          1.15.1
     charset-normalizer            2.1.1
     click                         8.1.3
     click-plugins                 1.1.1
     cligj                         0.7.2
     cloudpickle                   2.2.1
     colorama                      0.4.6
     colorcet                      3.0.1
     comm                          0.1.3
     confluent-kafka               1.7.0
     contourpy                     1.0.7
     cryptography                  40.0.2
     cubinlinker                   0.2.2
     cucim                         23.4.1
     cuda-python                   11.8.1
     cudf                          23.4.0
     cudf-kafka                    23.4.0
     cugraph                       23.4.0
     cuml                          23.4.0
     cupy                          11.6.0
     cusignal                      23.4.0
     cuspatial                     23.4.0
     custreamz                     23.4.0
     cuxfilter                     23.4.0
     cycler                        0.11.0
     cytoolz                       0.12.0
     dask                          2023.3.2
     dask-cuda                     23.4.0
     dask-cudf                     23.4.0
     dask-labextension             6.1.0
     datashader                    0.14.4
     datashape                     0.5.4
     debugpy                       1.6.7
     decorator                     5.1.1
     defusedxml                    0.7.1
     distributed                   2023.3.2.1
     entrypoints                   0.4
     executing                     1.2.0
     fastapi                       0.95.1
     fastavro                      1.7.3
     fasteners                     0.18
     fastjsonschema                2.16.3
     fastrlock                     0.8
     filelock                      3.12.0
     Fiona                         1.9.1
     flit_core                     3.8.0
     folium                        0.14.0
     fonttools                     4.39.3
     fqdn                          1.5.1
     frozendict                    2.3.7
     frozenlist                    1.3.3
     fsspec                        2023.4.0
     future                        0.18.3
     GDAL                          3.6.2
     geopandas                     0.12.2
     graphviz                      0.20.1
     h5py                          3.8.0
     holoviews                     1.15.4
     idna                          3.4
     imagecodecs                   2023.1.23
     imageio                       2.27.0
     importlib-metadata            6.5.0
     importlib-resources           5.12.0
     ipycytoscape                  1.3.3
     ipydatawidgets                4.3.2
     ipykernel                     6.22.0
     ipyleaflet                    0.17.2
     ipympl                        0.9.3
     ipython                       8.12.0
     ipython-genutils              0.2.0
     ipyvolume                     0.6.1
     ipyvue                        1.8.0
     ipyvuetify                    1.8.4
     ipywebrtc                     0.6.0
     ipywidgets                    8.0.6
     isoduration                   20.11.0
     jedi                          0.18.2
     Jinja2                        3.1.2
     joblib                        1.2.0
     json5                         0.9.5
     jsonpointer                   2.3
     jsonschema                    4.17.3
     jupyter_client                8.2.0
     jupyter_core                  5.3.0
     jupyter-events                0.6.3
     jupyter_server                2.5.0
     jupyter_server_fileid         0.9.0
     jupyter-server-proxy          3.2.2
     jupyter_server_terminals      0.4.4
     jupyter_server_ydoc           0.8.0
     jupyter-ydoc                  0.2.3
     jupyterlab                    3.6.3
     jupyterlab-pygments           0.2.2
     jupyterlab_server             2.22.1
     jupyterlab-widgets            3.0.7
     kiwisolver                    1.4.4
     lazy_loader                   0.2
     llvmlite                      0.39.1
     locket                        1.0.0
     lz4                           4.3.2
     mapclassify                   2.5.0
     Markdown                      3.4.3
     markdown-it-py                2.2.0
     MarkupSafe                    2.1.2
     matplotlib                    3.7.1
     matplotlib-inline             0.1.6
     mdurl                         0.1.0
     mistune                       2.0.5
     msgpack                       1.0.5
     multidict                     6.0.4
     multipledispatch              0.6.0
     munch                         2.5.0
     munkres                       1.1.4
     nbclassic                     0.5.5
     nbclient                      0.7.3
     nbconvert                     7.3.1
     nbformat                      5.8.0
     nest-asyncio                  1.5.6
     networkx                      3.1
     notebook                      6.5.4
     notebook_shim                 0.2.3
     numba                         0.56.4
     numcodecs                     0.11.0
     numexpr                       2.8.4
     numpy                         1.23.5
     nvtx                          0.2.5
     packaging                     23.1
     pandas                        1.5.3
     pandocfilters                 1.5.0
     panel                         0.14.1
     param                         1.13.0
     parso                         0.8.3
     partd                         1.4.0
     patsy                         0.5.3
     pexpect                       4.8.0
     pickleshare                   0.7.5
     Pillow                        9.4.0
     pip                           23.1
     pkgutil_resolve_name          1.3.10
     platformdirs                  3.2.0
     pooch                         1.7.0
     progressbar2                  4.2.0
     prometheus-client             0.16.0
     prompt-toolkit                3.0.38
     protobuf                      4.21.12
     psutil                        5.9.5
     ptxcompiler                   0.7.0
     ptyprocess                    0.7.0
     pure-eval                     0.2.2
     pyarrow                       10.0.1
     pycparser                     2.21
     pyct                          0.4.6
     pydantic                      1.10.7
     pydeck                        0.5.0
     pyee                          8.1.0
     pyerfa                        2.0.0.3
     Pygments                      2.15.1
     pylibcugraph                  23.4.0
     pylibraft                     23.4.0
     pynvml                        11.4.1
     pyOpenSSL                     23.1.1
     pyparsing                     3.0.9
     pyppeteer                     1.0.2
     pyproj                        3.4.0
     pyrsistent                    0.19.3
     PySocks                       1.7.1
     python-dateutil               2.8.2
     python-json-logger            2.0.7
     python-utils                  3.5.2
     pythreejs                     2.4.2
     pytz                          2023.3
     pyviz-comms                   2.2.1
     PyWavelets                    1.4.1
     PyYAML                        6.0
     pyzmq                         25.0.2
     raft-dask                     23.4.0
     requests                      2.28.2
     rfc3339-validator             0.1.4
     rfc3986-validator             0.1.1
     rich                          13.3.4
     rmm                           23.4.0
     Rtree                         1.0.1
     scikit-image                  0.20.0
     scikit-learn                  1.2.2
     scipy                         1.10.1
     seaborn                       0.12.2
     Send2Trash                    1.8.0
     setuptools                    67.6.1
     shapely                       2.0.1
     simpervisor                   0.4
     six                           1.16.0
     sniffio                       1.3.0
     sortedcontainers              2.4.0
     soupsieve                     2.3.2.post1
     spectate                      1.0.1
     stack-data                    0.6.2
     starlette                     0.26.1
     statsmodels                   0.13.5
     streamz                       0.6.4
     tables                        3.7.0
     tabulate                      0.9.0
     tblib                         1.7.0
     terminado                     0.17.1
     threadpoolctl                 3.1.0
     tifffile                      2023.4.12
     tiledb                        0.21.2
     tinycss2                      1.2.1
     tomli                         2.0.1
     toolz                         0.12.0
     tornado                       6.3
     tqdm                          4.65.0
     traitlets                     5.9.0
     traittypes                    0.2.1
     treelite                      3.2.0
     treelite-runtime              3.2.0
     typing_extensions             4.5.0
     ucx-py                        0.31.0
     unicodedata2                  15.0.0
     uri-template                  1.2.0
     urllib3                       1.26.15
     vaex-astro                    0.9.3
     vaex-core                     4.16.1
     vaex-hdf5                     0.14.1
     vaex-jupyter                  0.8.1
     vaex-ml                       0.18.1
     vaex-server                   0.8.1
     vaex-viz                      0.5.4
     wcwidth                       0.2.6
     webcolors                     1.13
     webencodings                  0.5.1
     websocket-client              1.5.1
     websockets                    10.4
     wheel                         0.40.0
     widgetsnbextension            4.0.7
     xarray                        2023.4.1
     xgboost                       1.7.5
     xyzservices                   2023.2.0
     y-py                          0.5.9
     yarl                          1.8.2
     ypy-websocket                 0.8.2
     zarr                          2.14.2
     zict                          3.0.0
     zipp                          3.15.0

Additional context Add any other context about the problem here.

rapidsai / cudf

[BUG] dask_cudf - aggregate - to_csv memory error #13220