sbird / fake_spectra

A code for generating fake spectra from a cosmological simulation
MIT License
12 stars 13 forks source link

h5py file not properly closed #50

Closed andreufont closed 3 years ago

andreufont commented 3 years ago

The code runs fine in my laptop (MacOS), but when I run in the cluster in Barcelona I get an error when trying to close the HDF5 file. This is what I get when I run a test script (see below) that reloads a few spectra from the snapshot:

(common) [afontrib@ui04 ~]$ python test_hdf5.py 
16384  sightlines. resolution:  10.000239687604589  z= 2.8000000922482458
mean flux= 0.6089995374278916
Exception ignored in: <function AbstractSnapshot.__del__ at 0x7fbe5950baf0>
Traceback (most recent call last):
  File "/data/desi/software/env/common/lib/python3.8/site-packages/fake_spectra/abstractsnapshot.py", line 44, in __del__
  File "/data/desi/software/env/common/lib/python3.8/site-packages/h5py/_hl/files.py", line 446, in close
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5f.pyx", line 275, in h5py.h5f.get_obj_ids
  File "h5py/h5i.pyx", line 40, in h5py.h5i.wrap_identifier
ModuleNotFoundError: import of h5py halted; None in sys.modules

Here is the script:

from fake_spectra import griddedspectra as gs

sim_dir='/data/desi/common/HydroData/Sherwood//planck1_80_1024/'

snap_num=9
n_skewers=128
pixel_width_kms=10

savedir='test_skewers/'
savefile='gridded_skewers.hdf5'

spec=gs.GriddedSpectra(snap_num,sim_dir,nspec=n_skewers,res=pixel_width_kms,
             savedir=savedir,savefile=savefile,reload_file=True)

print('mean flux=',spec.get_mean_flux())

Note that it also crashes similarly when loading from savefile.

It is a bit annoying because the HDF5 file stays open, and one can not make a deepcopy of the spectra class since you can not copy HDF5 files.

I had h5py version 2.10, but the issue remains even after updating to 3.1.

Has anyone found something like this? It might be something fishy with my local cluster... In which case I'll contact the help desk.

qezlou commented 3 years ago

Hi Andreu, I have never encountered something like that before. As you said, I suspect something might be wrong with the local cluster.

sbird commented 3 years ago

I've never encountered that exact error but mysterious similar errors often result when then cluster version of python or h5py were updated compiler without updating the underlying hdf5, or vice versa when hdf5 was recompiled with a new compiler and h5py was not updated. Definitely something to bring up with your cluster admin.

andreufont commented 3 years ago

Hi @sbird, I will try my luck with the help desk.

For the record, I was able to isolate the problem even further. It only happens when I use the Sherwood simulations, since the other snapshots I have are BigFile. It happens whenever I load the snapshot from Spectra, even if it is only to access the header, as in:

from fake_spectra import griddedspectra as gs
sim_dir='/data/desi/common/HydroData/Sherwood/planck1_80_1024/'
snap_num=9
pixel_res=10.0
spec=gs.GriddedSpectra(snap_num,sim_dir,res=pixel_res,reload_file=True)

It looks like the code reads the snapshot data properly (redshift, box size...), but the code crashes when deleting the snapshot class. The code ignores an exception in AbstractSnapshot.del() pointing again that the problem is related to closing the HDF5 file.

andreufont commented 3 years ago

Even more strange, if I directly construct a snapshot object outside of Spectra, it works just fine:

from fake_spectra import abstractsnapshot as absn

sim_dir='/data/desi/common/HydroData/Sherwood/planck1_80_1024/'
snap_num=9

#snap = absn.AbstractSnapshotFactory(snap_num, sim_dir)
snap = absn.HDF5Snapshot(snap_num, sim_dir)

# read number of particles from snapshot
npart = snap.get_npart()
print('npart',npart)
# read box size from header
box = snap.get_header_attr("BoxSize")
print('box',box)
andreufont commented 3 years ago

Before contacting the help desk, I copied one of the snapshots from the Sherwood simulations to my laptop, and it also crashes there. I tried to run the code on Hypatia, and it also crashes there. I didn't notice this before because only with the Sherwood sims I use HDF5.

I can live with this issue, since the skewers are actually written to disk, it is only when exiting that I get the error. Or when doing a deepcopy of a spectra class, but I can write a work around.

sbird commented 3 years ago

All that function is doing is to close the file! I don't know what we can do to fix it. I think it is possibly becoming unhappy when it is closed while some part of the IO is still live (this might be an h5py bug). You might just not be able to make a deepcopy of the h5py-containing object. If it worries you, you could also just delete the close() call: h5py will try to close the file when it goes out of scope using the normal python garbage collector anyway.

andreufont commented 3 years ago

It doesn't bother me anymore, since I wrote a work-around to avoid using deepcopy.

Happy to leave it as it is, but I thought I'd add more documentation in case someone else finds the same issue in the future.

I'm assuming most of the time you are using BigFile nowadays, right?

qezlou commented 3 years ago

Hello,

I have used fake_spectra for Illustris/TNG snapshots which are in hdf5, but never tried to deepcopy the spectra class. I have just generated the spectra and recorded them on the file. In those cases, it was working properly.

On Thu, Nov 12, 2020 at 8:22 AM Andreu Font-Ribera notifications@github.com wrote:

It doesn't bother me anymore, since I wrote a work-around to avoid using deepcopy.

Happy to leave it as it is, but I thought I'd add more documentation in case someone else finds the same issue in the future.

I'm assuming most of the time you are using BigFile nowadays, right?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/sbird/fake_spectra/issues/50#issuecomment-726182693, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKSFRZ6ARJVN7QWO6S7OGK3SPQDSZANCNFSM4TSJVHZQ .

sbird commented 3 years ago

It is very likely also h5py version dependent. If you want to write a doc patch I would be happy to take it.