Closed akeliduo closed 3 years ago
If the file is a CSV, vaex uses pandas under the hood to load. So the memory limit would be whatever pandas dictates. Vaex shines in file types like parquet, hdf5, etc than can be read/transformed in small chunks as opposed to requiring the entire file being read into memory first.
I suggest converting the CSV file (in chunks) into an HDF5, Parquet, etc. file, then loading it with vaex.
Keep in mind, chunk size in vaex works differently than in pandas. In pandas it gives you a generator so you can loop over portions of data(frames), but in vaex it is used as a sample size to load intermediate data, convert it to hdf5 or arrow, so then you can work with the whole data easily.
The rest is as @kmcentush said.
Thank you all.
When using vaex.open(filename), I get a Memory Error. My question is What is the size of the largest file that vaex can open? And What should I do if I don't want to use chunk to open it? Is is possible not to use chunk? Thanks.