vaexio / vaex

Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀
https://vaex.io
MIT License
8.28k stars 590 forks source link

[BUG-REPORT] Vaex from HDF5 memory mapped issue when pickling and unpickling #2179

Open ukanchan opened 2 years ago

ukanchan commented 2 years ago

I am using the Vaex library to construct a dataframe from an HDF5 file dataset and storing and retrieving it with pickle.

The following error message appears when I attempt to retrieve the dataframe from Pickle (unpickling)

Error during unpickling object (Possibly unsupported): 'Hdf5MemoryMapped' object has no attribute 'tls_map'

Could you guys please provide us some advice on how to deal with this?

Does Vaex support pickling and unpickling the dataframe created from HDF5 files.?

Note: Using latest Vaex version - 4.11

JovanVeljanoski commented 2 years ago

If you have a hdf5 file, then the data is already serialized to disk, why do you want to pickle it? Pickling for data is unsafe - we do not recommend it or officially support it. Besides if you are reading data from hdf5, the data is not in memory but on disk, and you streaming it on demand - so pickling would not just work anyway. Maybe you should go over the tutorial to learn how vaex.

And please do not spam the message boards with the same question: will not make anyone wanna answer faster, but the opposite actually.