nion-software / nionswift

Nion Swift is open source scientific image processing software integrating hardware control, data acquisition, visualization, processing, and analysis using Python. Nion Swift is easily extended using Python. It runs on Windows, Linux, and macOS.
http://nion.com/swift
GNU General Public License v3.0
43 stars 33 forks source link

Opening projects with numerous .h5 files can fail under Linux #1006

Open Brow71189 opened 4 months ago

Brow71189 commented 4 months ago

I got an error report with the following traceback from a user:

OSError: Unable to create file (unable to open file: name = '/Volumes/ex<username>/PhD/Nion_Data/alpha_CuPc_LARBED_06_02_24/alpha_CuPc_LARBED_06_02_24 Data/2024/02/07/20240206-102635/data_IFY04E80JSMYJ71MSPDMEKRFN.h5', errno = 24, error message = 'Too many open files', flags = 15, o_flags = a02)
File "/Users/<username>/miniconda3/envs/nionswift-dev/bin/Nion Swift.app/Contents/MacOS/../Resources/bootstrap.py", line 137, in start
File "/Users/<username>/code/nionswift-dev/nionswift/nion/swift/Application.py", line 195, in start
File "/Users/<username>/code/nionswift-dev/nionswift/nion/swift/Application.py", line 218, in open_project_window
File "/Users/<username>/code/nionswift-dev/nionswift/nion/swift/model/Profile.py", line 419, in read_project
File "/Users/<username>/code/nionswift-dev/nionswift/nion/swift/model/Profile.py", line 165, in load_project
File "/Users/<username>/code/nionswift-dev/nionswift/nion/swift/model/Project.py", line 238, in prepare_read_project
File "/Users/<username>/code/nionswift-dev/nionswift/nion/swift/model/FileStorageSystem.py", line 510, in read_project_properties
Traceback (most recent call last):
File "/Users/<username>/code/nionswift-dev/nionswift/nion/swift/Application.py", line 195, in start
File "/Users/<username>/code/nionswift-dev/nionswift/nion/swift/Application.py", line 218, in open_project_window
File "/Users/<username>/code/nionswift-dev/nionswift/nion/swift/model/Profile.py", line 419, in read_project
File "/Users/<username>/code/nionswift-dev/nionswift/nion/swift/model/Profile.py", line 165, in load_project
File "/Users/<username>/code/nionswift-dev/nionswift/nion/swift/model/Project.py", line 238, in prepare_read_project
File "/Users/<username>/code/nionswift-dev/nionswift/nion/swift/model/FileStorageSystem.py", line 523, in read_project_properties
File "/Users/<username>/code/nionswift-dev/nionswift/nion/swift/model/FileStorageSystem.py", line 679, in _read_properties
File "/Users/<username>/miniconda3/envs/nionswift-dev/lib/python3.7/pathlib.py", line 1208, in open
File "/Users/<username>/miniconda3/envs/nionswift-dev/lib/python3.7/pathlib.py", line 1063, in _opener
OSError
:
[Errno 24] Too many open files: '/Volumes/ex<username>/PhD/Nion_Data/alpha_CuPc_LARBED_06_02_24/alpha_CuPc_LARBED_06_02_24.nsproj'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/<username>/miniconda3/envs/nionswift-dev/bin/Nion Swift.app/Contents/MacOS/../Resources/bootstrap.py", line 137, in start
File "/Users/<username>/code/nionswift-dev/nionswift/nion/swift/Application.py", line 197, in start
File "/Users/<username>/code/nionswift-dev/nionui/nion/ui/Application.py", line 169, in show_ok_dialog
File "/Users/<username>/code/nionswift-dev/nionui/nion/ui/Declarative.py", line 943, in run
File "/Users/<username>/code/nionswift-dev/nionui/nion/ui/Declarative.py", line 973, in run_window
File "/Users/<username>/code/nionswift-dev/nionui/nion/ui/Window.py", line 197, in __init__
File "/Users/<username>/miniconda3/envs/nionswift-dev/lib/python3.7/asyncio/events.py", line 762, in new_event_loop
File "/Users/<username>/miniconda3/envs/nionswift-dev/lib/python3.7/asyncio/events.py", line 660, in new_event_loop
File "/Users/<username>/miniconda3/envs/nionswift-dev/lib/python3.7/asyncio/unix_events.py", line 51, in __init__
File "/Users/<username>/miniconda3/envs/nionswift-dev/lib/python3.7/asyncio/selector_events.py", line 57, in __init__
File "/Users/<username>/miniconda3/envs/nionswift-dev/lib/python3.7/selectors.py", line 511, in __init__
OSError
:
[Errno 24] Too many open files

Turns out, Linux has a maximum number of files that can be open in parallel per process, see here: https://www.howtogeek.com/805629/too-many-open-files-linux/

You can increase this limit temporarily or permanently, which helped as a workaround in this case. A better solution would be to improve the project loading process so that Swift does not keep handles to all files in the project.

### Tasks
- [ ] https://github.com/nion-software/nionswift/issues/1007
- [ ] https://github.com/nion-software/nionswift/issues/1067
cmeyer commented 4 months ago

Note: HDF5Handler keeps files open, NDataHandler does not.

TomaSusi commented 4 months ago

Somewhat related: is there a reason Swift needs to read-lock all HDF5 files stored in a library? Would it be a possible modification to apply only write-locking upon write?

cmeyer commented 4 months ago

Somewhat related: is there a reason Swift needs to read-lock all HDF5 files stored in a library? Would it be a possible modification to apply only write-locking upon write?

Maybe, I'll look into it while I'm working on this.

TomaSusi commented 4 months ago

That would be brilliant, thank you!

cmeyer commented 4 months ago

Somewhat related: is there a reason Swift needs to read-lock all HDF5 files stored in a library? Would it be a possible modification to apply only write-locking upon write?

@TomaSusi Can you file a separate bug (or point to an existing one) with your requirements about what you'd like to be able to do with the h5py files when a Swift project is open? Generally, we consider everything in the project folder to be private to Swift while the project is open in Swift - so also try to justify the requirements with the use case or user story.

TomaSusi commented 4 months ago

Basically this is about the already old and still unaddressed issue of using Swift to navigate your data and accessing large datasets without duplicating data on disk #807. It's annoying to have to close Swift to load a dataset.

We actually also run into this when a user forgets to close Swift on the User PC and our nightly backup cannot run because of read locks.

A good solution from our point of view while waiting (and waiting...) for Swift to come up with a solution you like would be to enable Single Writer Multiple Reader by default – I don't see a downside (maybe there is one?) or any visible change for users: https://docs.h5py.org/en/stable/swmr.html?highlight=swmr

cmeyer commented 4 months ago

I plan on fixing this issue (too many files open) and then address the other issues separately (starting with any new issues and leading up to #807 and #539). I'm hoping to be able to distinguish between read and write accesses, but that's going to require some architecture changes.

Some notes about SWMR. The HDF5 SWMR User's Guide explicitly says it doesn't work on Windows: "The HDF5 SWMR implementation is currently only supported on Unix-like systems. The implementation is not being tested on Windows systems at this time.". I don't see anything newer indicating it does work on Windows, but I didn't do an exhaustive search.

In addition, also see https://github.com/h5py/h5py/issues/2022 which is reporting a crash on Windows.

TomaSusi commented 4 months ago

Ah, my apologies, I did not realize that about Windows, so obviously that’s not a solution. In any case, we consider this (now two-year open problem) to be of even more high priority now that we have the new detector, so any solution you are happy with we would be very keen to test out.

cmeyer commented 4 months ago

data_item.xdata is problematic because it returns a reference to data that might be unloaded after the call. either the xdata returned has to manage the reference count or users must be expected to hold a data reference. an obvious spot where the data reference must be held is for DisplayValues, so #1007 is a dependency for now.

Generally, to be able to close the file, we must ensure that anyone holding the data array via data_item.xdata has the file open. There is no current mechanism for this in the code at this point. This may take a few releases to eliminate these use cases if they exist.

Anyone watching this issue should investigate their own code to see if they access data using data_item.xdata and if so, try to switch it to data_item.data_ref (which isn't 100% the same since it only returns the ndarray, but the metadata can be accessed via the data_item using other methods like dimensional_calibrations).