pyiron / pyiron_atomistics

pyiron_atomistics - an integrated development environment (IDE) for atomistic simulation in computational materials science.
https://pyiron-atomistics.readthedocs.io
BSD 3-Clause "New" or "Revised" License
44 stars 15 forks source link

hdf5 files are opened too often in DataContainer when loading SPHInX jobs #1574

Open freyso opened 2 months ago

freyso commented 2 months ago

Test example:

import pyiron
import pyiron_base.storage.hdfio

# --- replace pyiron_base.storage.hdfio.FileHDFio._open_hdf by wrapper to count calls
def _open_hdf(filename,mode="r",swmr=False):
    import h5io_browser.base
    global count_open_calls
    count_open_calls += 1
    return h5io_browser.base._open_hdf(filename,mode,swmr)
setattr(pyiron_base.storage.hdfio,"_open_hdf",_open_hdf)
count_open_calls=0

pr = pyiron.Project ("test")
bulk_str = pr.create.structure.bulk ("TiC", crystalstructure="rocksalt", a=4.33)
dft = pr.create.job.Sphinx ("bulk-dummy")
dft.structure = bulk_str
dft.input['Xcorr']='PBE'
dft.set_encut (450)
dft.set_kpoints (3*[4])
print(count_open_calls)
dft.calc_static ()
print(count_open_calls)

dft.run (delete_existing_job=True)
print(count_open_calls)

pr.load ("bulk-dummy")
print(count_open_calls)

I get 269 _open_hdf calls for the run, and 802 calls for the load.

From discussions with @jan-janssen and @pmrv , I get the info that the key challenge here is

Possible solution

Use scenario:

from pyiron_base.storage.hdfio import hdf_leave_open

with hdf_leave_open(h5filename):
    # file is kept open within cache in this block
    data_container.recursive_load (h5filename)
# h5file is closed

In this way, high-level code can indicate when it is going to enter and leave hdf5-intense code, and low-level code would be augmented by a single check for the existence of an open file handle before opening the file. Performance-wise, the open call is much more expensive than the dictionary lookup, so no measurable price to pay when the cache is not used.

Using a context should make this rather robust against unexpected errors. The new context manager should check at context enter if the cache is already filled, and if so, do nothing and set a "noop" flag for the context exit. Then, one could even deal with nested contexts for the same file if programmers do not realize that other parts of the code also have caching instructions. If pyiron needs to be thread-safe, a locking mechanism for the cache access is needed.

pmrv commented 2 months ago

Yep, this would the solution I'd like too. Last I tried to prototype it I got bogged down in the details, but in the end I think it would have to be something like this.

samwaseda commented 2 months ago

Last I tried to prototype it I got bogged down in the details, but in the end I think it would have to be something like this.

Same for me…

pmrv commented 2 months ago

So I did a quick survey and I think it can be implemented in h5io_browsers _open_hdf, but there's two things with the context manager approach that make this slightly more tricky:

  1. What happens if the same file gets opened with different modes in nested calls to the context manager
  2. The returned file handle is used outside of h5io_browser as a context manager, which closes the underlying file. So if there are multiple of such usages (as there are bound to be), the first one would close the file making it unusable for the second.

The first is probably the smallest problem, but I'd have to check where we could hook into to circumvent the second problem.

freyso commented 2 months ago

To avoid concurrent efforts: I will make a first draft of this; we can discuss it on Monday.

pmrv commented 2 months ago

To avoid concurrent efforts: I will make a first draft of this; we can discuss it on Monday.

Sorry, I couldn't sleep yesterday and I have something almost there now. :')

pmrv commented 2 months ago

Main changes to h5io are here, I won't be unhappy if you want to take a look and take it from there, but it would need then strategic placement of the context manager around pyiron_base and maybe even the sphinx class. When I use the context manager in sphinx' to_hdf/from_hdf/collect_output/set_input_read_only, I can save most of the file open that you reported.

samwaseda commented 2 months ago

Ok I’m gonna try to work on it by Monday just in order to make it more complicated