I have 180G memory and want to process a 400G dataset.
When I use from_csv_arrow, with lazy=True, chunk_size="10GiB", newline_readahead="640MiB", vaex only uses around 2G memory, which causes the processing really slow.
If I read all data into memory the computation will be extremely fast. I have tried 40G dataset, that's alright, but I cannot read 400G into memory and vaex seems to fail to take advantage of the memory.
Am I wrong with some configuration?
What should I do, I have stuck at this problem and really need your help.
I have 180G memory and want to process a 400G dataset.
When I use
from_csv_arrow
, withlazy=True, chunk_size="10GiB", newline_readahead="640MiB"
, vaex only uses around 2G memory, which causes the processing really slow.If I read all data into memory the computation will be extremely fast. I have tried 40G dataset, that's alright, but I cannot read 400G into memory and vaex seems to fail to take advantage of the memory.
Am I wrong with some configuration? What should I do, I have stuck at this problem and really need your help.