vaexio / vaex

Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀
https://vaex.io
MIT License
8.22k stars 589 forks source link

[BUG-REPORT] HDF5 file remains locked after close #2410

Open intelligibledata opened 5 months ago

intelligibledata commented 5 months ago

Thank you for reaching out and helping us improve Vaex!

Before you submit a new Issue, please read through the documentation. Also, make sure you search through the Open and Closed Issues - your problem may already be discussed or addressed.

Description I need to change the data in a column in an existing HDF5 file and write it back to the file. The problem is that as soon as I use and existing column from the df to change the data the HDF5 file gets locked and I cannot write to it or delete/replace it without closing the application (which I do not want to do since its a dash board). I reduced the problem to the following script:

`import vaex as vx original_file="" tempfile = "" ) df = vx.open(original_file) dftemp = df.copy() dftemp["new column1"] = vx.vconstant("test", dftemp.shape[0])

dftemp["new column2"] = dftemp["lockprofile"]

dftemp.drop(dftemp["index"])

dftemp.export_hdf5(tempfile, progress=True)

dftemp.close() df.close()`

With the 2 lines commented out this script works correctly and both files are unlocked after running this script in jupyter. When uncommenting one of the 2 other lines I get the following error: "could not close memmap ... dataset_mmap.py:94"

In the example above I tried to write to another file since writing to the same file is also not possible due to the lock on the file.

Also when you do not write the data to another file the lock is place on the hdf5 file as soon as you use an existing column to add a new column and never released until the application is closed.

Software information

Additional information Please state any supplementary information or provide additional context for the problem (e.g. screenshots, data, etc..).