Open Brow71189 opened 4 months ago
Note: HDF5Handler keeps files open, NDataHandler does not.
Somewhat related: is there a reason Swift needs to read-lock all HDF5 files stored in a library? Would it be a possible modification to apply only write-locking upon write?
Somewhat related: is there a reason Swift needs to read-lock all HDF5 files stored in a library? Would it be a possible modification to apply only write-locking upon write?
Maybe, I'll look into it while I'm working on this.
That would be brilliant, thank you!
Somewhat related: is there a reason Swift needs to read-lock all HDF5 files stored in a library? Would it be a possible modification to apply only write-locking upon write?
@TomaSusi Can you file a separate bug (or point to an existing one) with your requirements about what you'd like to be able to do with the h5py files when a Swift project is open? Generally, we consider everything in the project folder to be private to Swift while the project is open in Swift - so also try to justify the requirements with the use case or user story.
Basically this is about the already old and still unaddressed issue of using Swift to navigate your data and accessing large datasets without duplicating data on disk #807. It's annoying to have to close Swift to load a dataset.
We actually also run into this when a user forgets to close Swift on the User PC and our nightly backup cannot run because of read locks.
A good solution from our point of view while waiting (and waiting...) for Swift to come up with a solution you like would be to enable Single Writer Multiple Reader by default – I don't see a downside (maybe there is one?) or any visible change for users: https://docs.h5py.org/en/stable/swmr.html?highlight=swmr
I plan on fixing this issue (too many files open) and then address the other issues separately (starting with any new issues and leading up to #807 and #539). I'm hoping to be able to distinguish between read and write accesses, but that's going to require some architecture changes.
Some notes about SWMR. The HDF5 SWMR User's Guide explicitly says it doesn't work on Windows: "The HDF5 SWMR implementation is currently only supported on Unix-like systems. The implementation is not being tested on Windows systems at this time.". I don't see anything newer indicating it does work on Windows, but I didn't do an exhaustive search.
In addition, also see https://github.com/h5py/h5py/issues/2022 which is reporting a crash on Windows.
Ah, my apologies, I did not realize that about Windows, so obviously that’s not a solution. In any case, we consider this (now two-year open problem) to be of even more high priority now that we have the new detector, so any solution you are happy with we would be very keen to test out.
data_item.xdata
is problematic because it returns a reference to data that might be unloaded after the call. either the xdata
returned has to manage the reference count or users must be expected to hold a data reference. an obvious spot where the data reference must be held is for DisplayValues
, so #1007 is a dependency for now.
Generally, to be able to close the file, we must ensure that anyone holding the data array via data_item.xdata
has the file open. There is no current mechanism for this in the code at this point. This may take a few releases to eliminate these use cases if they exist.
Anyone watching this issue should investigate their own code to see if they access data using data_item.xdata
and if so, try to switch it to data_item.data_ref
(which isn't 100% the same since it only returns the ndarray
, but the metadata can be accessed via the data_item
using other methods like dimensional_calibrations
).
I got an error report with the following traceback from a user:
Turns out, Linux has a maximum number of files that can be open in parallel per process, see here: https://www.howtogeek.com/805629/too-many-open-files-linux/
You can increase this limit temporarily or permanently, which helped as a workaround in this case. A better solution would be to improve the project loading process so that Swift does not keep handles to all files in the project.