Open freyso opened 2 months ago
Yep, this would the solution I'd like too. Last I tried to prototype it I got bogged down in the details, but in the end I think it would have to be something like this.
Last I tried to prototype it I got bogged down in the details, but in the end I think it would have to be something like this.
Same for me…
So I did a quick survey and I think it can be implemented in h5io_browser
s _open_hdf
, but there's two things with the context manager approach that make this slightly more tricky:
The first is probably the smallest problem, but I'd have to check where we could hook into to circumvent the second problem.
To avoid concurrent efforts: I will make a first draft of this; we can discuss it on Monday.
To avoid concurrent efforts: I will make a first draft of this; we can discuss it on Monday.
Sorry, I couldn't sleep yesterday and I have something almost there now. :')
Main changes to h5io
are here, I won't be unhappy if you want to take a look and take it from there, but it would need then strategic placement of the context manager around pyiron_base and maybe even the sphinx class. When I use the context manager in sphinx' to_hdf
/from_hdf
/collect_output
/set_input_read_only
, I can save most of the file open that you reported.
Ok I’m gonna try to work on it by Monday just in order to make it more complicated
Test example:
I get 269
_open_hdf
calls for the run, and 802 calls for the load.From discussions with @jan-janssen and @pmrv , I get the info that the key challenge here is
Possible solution
hdf_leave_open
)Use scenario:
In this way, high-level code can indicate when it is going to enter and leave hdf5-intense code, and low-level code would be augmented by a single check for the existence of an open file handle before opening the file. Performance-wise, the open call is much more expensive than the dictionary lookup, so no measurable price to pay when the cache is not used.
Using a context should make this rather robust against unexpected errors. The new context manager should check at context enter if the cache is already filled, and if so, do nothing and set a "noop" flag for the context exit. Then, one could even deal with nested contexts for the same file if programmers do not realize that other parts of the code also have caching instructions. If pyiron needs to be thread-safe, a locking mechanism for the cache access is needed.