vaexio / vaex

Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀
https://vaex.io
MIT License
8.22k stars 589 forks source link

[BUG-REPORT] converting massive CSV (50GB) stalls #2390

Open mfouesneau opened 10 months ago

mfouesneau commented 10 months ago

Description I have multiple massive CSV files (~50GB) that I would like to put into a more efficient format

Following the documentation, I tried

vaex.open('file.csv', convert='file.hdf5', progress=True)

After many hours and no progress, the HDF5 file is only a few bits.

I tried the old way of

vaex.open('file.csv').export_hdf5(convert='file.hdf5', progress=True)

This creates a 7K file rapidly, but nothing happens either after.

In both cases, the file contains /table/columns but no column definition.

Software information