slaclab / lc2-hdf5-110

Investigate hdf5 1.10 features like SWMR and virtual dataset for LCLS II
Apache License 2.0
0 stars 2 forks source link

error opening VDS source file #11

Closed davidslac closed 7 years ago

davidslac commented 7 years ago

If I run this bash script:

#!/usr/bin/bash

rm /reg/d/ana01/temp/davidsch/lc2/runA/hdf5/*
bin/daq_writer config.yaml 0 &
bin/daq_writer config.yaml 1 &
bin/daq_writer config.yaml 2 &
sleep 1
bin/daq_master config.yaml 0 &
sleep 1 
bin/ana_reader_master config.yaml 0 &
bin/ana_reader_master config.yaml 1 &

which cleans out old files, starts 3 writers, waits a second, starts the master that makes the VDS, waits a second, and then starts two readers, I end up with

HDF5-DIAG: Error detected in HDF5 (1.10.0-patch1) thread 0:
  #000: H5F.c line 579 in H5Fopen(): unable to open file
    major: File accessibilty
    minor: Unable to open file
  #001: H5Fint.c line 1168 in H5F_open(): unable to lock the file or initialize file structure
    major: File accessibilty
    minor: Unable to open file
  #002: H5FD.c line 1821 in H5FD_lock(): driver lock request failed
    major: Virtual File Layer
    minor: Can't update object
  #003: H5FDsec2.c line 939 in H5FD_sec2_lock(): unable to flock file, errno = 11, error message = 'Resource temporarily unavailable'
    major: File accessibilty
    minor: Bad file ID accessed
HDF5-DIAG: Error detected in HDF5 (1.10.0-patch1) thread 0:
  #000: H5F.c line 579 in H5Fopen(): unable to open file
    major: File accessibilty
    minor: Unable to open file
  #001: H5Fint.c line 1168 in H5F_open(): unable to lock the file or initialize file structure
    major: File accessibilty
    minor: Unable to open file
  #002: H5FD.c line 1821 in H5FD_lock(): driver lock request failed
    major: Virtual File Layer
    minor: Can't update object
  #003: H5FDsec2.c line 939 in H5FD_sec2_lock(): unable to flock file, errno = 11, error message = 'Resource temporarily unavailable'
    major: File accessibilty
    minor: Bad file ID accessed
terminate called after throwing an instance of 'std::runtime_error'
  what():  ERROR: -1 = H5Fopen(m_master_fname.c_str(), H5F_ACC_RDONLY | H5F_ACC_SWMR_READ, H5P_DEFAULT) line=129  file=app/ana_reader_master.cpp

ana_reader_master: found /reg/d/ana01/temp/davidsch/lc2/runA/hdf5/daq_master-s0000.h5
terminate called after throwing an instance of 'std::runtime_error'
  what():  ERROR: -1 = H5Fopen(m_master_fname.c_str(), H5F_ACC_RDONLY | H5F_ACC_SWMR_READ, H5P_DEFAULT) line=129  file=app/ana_reader_master.cpp

The ana_reader_master app will take a look at a VDS and try to open all the sources to see what the chunk size if for each source. This way it can make a big per dataset chunk - maybe that is where the error is coming from?

davidslac commented 7 years ago

I upgraded to hdf 1.10.1-pre1 and I thought this problem went away, but I still get it, like I just got

 cat data/logs/ana_reader_master_0.log 
1492038470809 ana_reader_master-s0000: found /reg/d/ana01/temp/davidsch/lc2/runA/hdf5/daq_master-s0000.h5
HDF5-DIAG: Error detected in HDF5 (1.10.1-pre1) thread 0:
  #000: H5F.c line 588 in H5Fopen(): unable to open file
    major: File accessibilty
    minor: Unable to open file
  #001: H5Fint.c line 1307 in H5F_open(): unable to lock the file
    major: File accessibilty
    minor: Unable to open file
  #002: H5FD.c line 1841 in H5FD_lock(): driver lock request failed
    major: Virtual File Layer
    minor: Can't update object
  #003: H5FDsec2.c line 942 in H5FD_sec2_lock(): unable to lock file, errno = 11, error message = 'Resource temporarily unavailable'
    major: File accessibilty
    minor: Bad file ID accessed
terminate called after throwing an instance of 'std::runtime_error'
  what():  ERROR: -1 = H5Fopen(m_master_fname.c_str(), H5F_ACC_RDONLY | H5F_ACC_SWMR_READ, H5P_DEFAULT) line=130  file=app/ana_reader_master.cpp

So, this is the setup

I need to turn off things that the reader does to see what exactly triggers this, maybe I should test back in 1.10 where the problem happened more reliably