Closed stromal closed 3 years ago
I have solved it by strictly defining datatypes
'uint8',
'float64',
'float64']
The smaller bit you go the better. than load it in like this:
dataset = cudf.read_csv('data.csv', dtype = col_types)
3 million rows x 500 columns of float64 values would be 12GB in memory (3e6 500 8 / 1e9). Glad you've got things working, but it's possible your data is larger than it seems on disk.
3 million rows x 500 columns of float64 values would be 12GB in memory (3e6 500 8 / 1e9). Glad you've got things working, but it's possible your data is larger than it seems on disk.
Is there a library that measures/calculate data sizes?
I am using Jupyter Lab on a pre configured EC2 g4dn.4xlarge instance (64 GB RAM, 16 core, NVIDIA T4 GPU)
I am just loading in a 3 GB csv with the following dimensions 3 million rows, 500 columns
It prints out the head correctly but when I run the next cell it restarts the whole thing and the code cell get number 1 so it doesn't sees the dataset i just loaded in a cell before because the whole environment restarted. There are no error messages in the jupyter lab at all I only see the following error message from
TERMINAL