[BUG] terminate called after throwing an instance of 'rmm::bad_alloc' what(): std::bad_alloc: CUDA error at: ../include/rmm/mr/device/cuda_memory_resource.hpp:68: cudaErrorMemoryAllocation out of memory

stromal commented 3 years ago

I am using Jupyter Lab on a pre configured EC2 g4dn.4xlarge instance (64 GB RAM, 16 core, NVIDIA T4 GPU)

Deep Learning Base AMI (Ubuntu 18.04) Version 31.0 - ami-063f381b07ea97834
Built with NVIDIA CUDA, cuDNN, NCCL, GPU Drivers, Intel MKL-DNN, Docker, NVIDIA-Docker and EFA support. For a fully managed experience, check: https://aws.amazon.com/sagemaker
Root device type: ebs Virtualization type: hvm ENA Enabled: Yes

I am just loading in a 3 GB csv with the following dimensions 3 million rows, 500 columns

dataset = cudf.read_csv('data.csv')
dataset.head()

It prints out the head correctly but when I run the next cell it restarts the whole thing and the code cell get number 1 so it doesn't sees the dataset i just loaded in a cell before because the whole environment restarted. There are no error messages in the jupyter lab at all I only see the following error message from

TERMINAL

[I 10:45:11.373 LabApp] Starting buffering for VERY_LONG_ID_HERE
[I 10:45:11.514 LabApp] Restoring connection for VERY_LONG_ID_HERE
terminate called after throwing an instance of 'rmm::bad_alloc'
  what():  std::bad_alloc: CUDA error at: ../include/rmm/mr/device/cuda_memory_resource.hpp:68: cudaErrorMemoryAllocation out of memory
[I 10:45:53.221 LabApp] KernelRestarter: restarting kernel (1/5), keep random ports

stromal commented 3 years ago

I have solved it by strictly defining datatypes

    'uint8',
    'float64',
    'float64']

The smaller bit you go the better. than load it in like this:

dataset = cudf.read_csv('data.csv', dtype = col_types)

beckernick commented 3 years ago

3 million rows x 500 columns of float64 values would be 12GB in memory (3e6 500 8 / 1e9). Glad you've got things working, but it's possible your data is larger than it seems on disk.

stromal commented 3 years ago

3 million rows x 500 columns of float64 values would be 12GB in memory (3e6 500 8 / 1e9). Glad you've got things working, but it's possible your data is larger than it seems on disk.

Is there a library that measures/calculate data sizes?

rapidsai / cudf

[BUG] terminate called after throwing an instance of 'rmm::bad_alloc' what(): std::bad_alloc: CUDA error at: ../include/rmm/mr/device/cuda_memory_resource.hpp:68: cudaErrorMemoryAllocation out of memory #6772