lofar-astron / prefactor

Pre facet calibration pipeline
http://www.astron.nl/citt/prefactor
GNU General Public License v3.0
29 stars 28 forks source link

HDF5 error, errno=11 #216

Closed amisk closed 5 years ago

amisk commented 5 years ago

Hi,

I am running prefactor now from a fresh install, and in the calibrator pipeline at one point I get

423884 2019-03-01 15:37:14 WARNING node.slurm17.executable_args.L643651_SB001_uv.ndppp_prep_cal: /opt/soft/lofar-stuff//bin/NDPPP stderr: HDF5-DIAG: Error detected in HDF5 (1.10.0-patch1) thread 140378072796352:
 423885   #000: ../../../src/H5F.c line 579 in H5Fopen(): unable to open file
 423886     major: File accessibilty
 423887     minor: Unable to open file
 423888   #001: ../../../src/H5Fint.c line 1168 in H5F_open(): unable to lock the file or initialize file structure
 423889     major: File accessibilty
 423890     minor: Unable to open file
 423891   #002: ../../../src/H5FD.c line 1821 in H5FD_lock(): driver lock request failed
 423892     major: Virtual File Layer
 423893     minor: Can't update object
 423894   #003: ../../../src/H5FDsec2.c line 939 in H5FD_sec2_lock(): unable to flock file, errno = 11, error message = 'Resource temporarily unavailable'
 423895     major: File accessibilty
 423896     minor: Bad file ID accessed
 423897 
 423898 **** uncaught exception ****
 423899 
 423900 Backtrace follows:
 423901 #0  0x7fac5044b82b in LOFAR::Exception::terminate() at Exception.cc:89
 423902 #1  0x7fac50136a06 in std::rethrow_exception(std::__exception_ptr::exception_ptr) at ??:0
 423903 #2  0x7fac50136a41 in std::terminate() at ??:0
 423904 #3  0x7fac50136c74 in __cxa_throw at ??:0
 423905 #4  0x7fac4bef93b4 in H5::H5File::H5File(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned int, H5::FileCreatPropList const&, H5::FileAccPropList const&) at ??:0
 423906 #5  0x7fac50e4fa0e in LOFAR::H5Parm::H5Parm(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool, bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const& 423906 ) at H5Parm.cc:21
 423907 #6  0x7fac50e3f847 in LOFAR::DPPP::OneApplyCal::OneApplyCal(LOFAR::DPPP::DPInput*, LOFAR::ParameterSet const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char 423907 , std::char_traits<char>, std::allocator<char> > const&, bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) at OneApplyCal.cc:99
 423908 #7  0x7fac50d57cfe in LOFAR::DPPP::ApplyCal::ApplyCal(LOFAR::DPPP::DPInput*, LOFAR::ParameterSet const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool, std::__cxx11::basic_string<char 423908 , std::char_traits<char>, std::allocator<char> >) at ApplyCal.cc:75
 423909 #8  0x7fac50d285e7 in LOFAR::DPPP::DPRun::makeSteps(LOFAR::ParameterSet const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, LOFAR::DPPP::DPInput*) at DPRun.cc:330
 423910 #9  0x7fac50d29012 in LOFAR::DPPP::DPRun::execute(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int, char**) at basic_string.h:647
 423911 #10 0x55b1bd5760d7 in main at NDPPP.cc:71
 423912 terminate called after throwing an instance of 'H5::FileIException'

Never mind the numbers before every line, they are from "less".

Any suggestions?

and why are the files called H5FD...

tikk3r commented 5 years ago

With version 1.10 HDF5 became a lot stricter regarding file locks, especially strictly only one read/write lock (but multiple read-only locks are allowed). Are you using a recent version of DPPP? They updated it in one commit by opening them as read-only for applycal steps. Otherwise maybe you need to compile HDF5 to allow parallel access (I compile mine with --enable-fortran --enable-cxx --enable-threadsfafe --enable-unsupported and that seems to work, but I use 1.8 still).

amisk commented 5 years ago

I installed hdf5 directly through the package manager of ubuntu. I can probably just revert to an earlier version. Would that be enough or would I need to recompile casacore and the Lofar Trunk in which NDPPP is in?

Ok, the ubuntu repository only has 1.10..

tikk3r commented 5 years ago

Hmm, yeah I install HDF5 manually (old system). I do know NDPPP has recently (after LOFAR 3.2.4 I think?) been split off to a standalone version and is no longer (supported) in the LOFAR trunk: https://github.com/lofar-astron/DP3 I have run prefactor succesfully with it, so it might be worth a try, if you don't mind compiling it yourself.

amisk commented 5 years ago

I installed hdf5 1.8.18, but when I try to compile the trunk it keeps telling me that it does not find HDF5. "Could NOT find HDF5 (missing: HDF5_INCLUDE_DIRS)"

I tried suppling it with -HDF5_ROOT_DIR=/usr/local/HDF_Group/HDF5/1.8.18 and some variations on it.

OT: On the other hand, I would like to use the new DP3. But this does not include genericpipeline.py which I need for prefactor. Somebody wrote somewhere (sorry, bad memory) that DP3 and the beam library would be enough for prefactor, but where do I get genericpipeline.py from if I go that route? (Also, in this git it says in the requirements that the full lofar software would be necessary?)

tikk3r commented 5 years ago

I set

export CMAKE_PREFIX_PATH=/path/to/hdf5
export LD_LIBRARY_PATH=/path/to/hdf5/lib/:$LD_LIBRARY_PATH

before compiling the LOFAR software.

but where do I get genericpipeline.py from if I go that route? (Also, in this git it says in the requirements that the full lofar software would be necessary?)

Same problem here. A standalone genericpipeline would be super awesome. You don't need the full LOFAR software in my experience though, I compile a bare minimum of -DBUILD_PACKAGES="MS pystationresponse ParmDB pyparmdb Pipeline" to get the genericpipeline and some other stuff, but that's all I install. DP3 and beam library I get from their respective github repositories, and it all seems to work fine for me. The only thing you need is ln -s /path/to/DPPP/bin/DPPP /path/to/lofar/install/bin/NDPPP, because the pipeline will still look for them in the LOFAR install directory.

amisk commented 5 years ago

This does not work ... cmake finds hdf5, but I get

CMake Error: The following variables are used in this project, but they are set to NOTFOUND.
Please set them or make sure they are set and tested correctly in the CMake files:
HDF5_hdf5_LIBRARY_DEBUG (ADVANCED)
    linked by target "common" in directory /opt/soft/lofar-stuff/BuildDir/lofarsoft/LOFAR/LCS/Common/src
    linked by target "logperf" in directory /opt/soft/lofar-stuff/BuildDir/lofarsoft/LOFAR/LCS/Common/src
    linked by target "versioncommon" in directory /opt/soft/lofar-stuff/BuildDir/lofarsoft/LOFAR/LCS/Common/src

with much more "linked by target" entries...

tikk3r commented 5 years ago

Hmm, maybe some hdf5 things are missing like libhdf5-dev or something?

amisk commented 5 years ago

I think so. Though I don't see the option to get this if I download the source for 1.8

tikk3r commented 5 years ago

Ah you're installing HDF5 manually? This is my install line:

mkdir -p $INSTALLDIR/hdf5
cd $INSTALLDIR/hdf5 && wget https://support.hdfgroup.org/ftp/HDF5/releases/hdf5-1.8/hdf5-1.8.21/src/hdf5-1.8.21.tar.gz && tar xf hdf5*.tar.gz
cd hdf5* && ./configure --prefix=$INSTALLDIR/hdf5 --enable-fortran --enable-cxx --enable-threadsfafe --enable-unsupported && $make -j $J && make install

We're on CentOS 7 though, so I don't know if Ubuntu may be a bit different.

amisk commented 5 years ago

Ok, compilation appears to have been succesfull... lets see.

amisk commented 5 years ago

when starting the genericpipeline it complains that the module "executable_args" is not found. I added lofarroot/lib/python2.7/site-packages/lofarpipe/recipes/nodes/ and lofarroot/lib/python2.7/site-packages/lofarpipe/recipes/master/ to PYTHONPATH, but is still will not find it ...

tikk3r commented 5 years ago

What if you just add lofarroot/lib/python2.7/site-packages to the PYTHONPATH (then you should have everything, instead of manually specifying all subfolders).

Is it really not finding "exectuable_args" or is it not finding the executable it's supposed to launch?

amisk commented 5 years ago

Just giving sitepackages is not enough. It previously complained about not finding PipelineStep_createMapfile.py, and when I set pythonpath to lofarroot/lib/python2.7/site-packages/lofarpipe/recipes/plugins/ the pipeline found it.

This looks to me like it just does not find the appriopriate module.

2019-03-04 19:40:44 INFO    genericpipeline: Beginning step check_Ateam_separation
2019-03-04 19:40:45 INFO    genericpipeline: Running task: pythonplugin
2019-03-04 19:40:45 ERROR   genericpipeline: Exception caught: No module named executable_args
Traceback (most recent call last):
  File "/opt/soft/lofar-stuff/lib/python2.7/site-packages/lofarpipe/cuisine/cook.py", line 32, in __init__
    module_details = imp.find_module(task.lower(), recipe_path)
ImportError: No module named executable_args
2019-03-04 19:40:45 WARNING genericpipeline: pythonplugin reports failure (using executable_args recipe)
2019-03-04 19:40:45 ERROR   genericpipeline: *******************************************
2019-03-04 19:40:45 ERROR   genericpipeline: Failed pipeline run: Pre-Facet-Calibrator
2019-03-04 19:40:45 ERROR   genericpipeline: Detailed exception information:
2019-03-04 19:40:45 ERROR   genericpipeline: <class 'lofarpipe.support.lofarexceptions.PipelineRecipeFailed'>
2019-03-04 19:40:45 ERROR   genericpipeline: pythonplugin failed
2019-03-04 19:40:45 ERROR   genericpipeline: *******************************************
2019-03-04 19:40:45 ERROR   genericpipeline: LOFAR Pipeline finished unsuccesfully.
2019-03-04 19:40:45 WARNING genericpipeline: recipe genericpipeline completed with errors
tikk3r commented 5 years ago

It previously complained about not finding PipelineStep_createMapfile.py

That's usually a sign of the pipeline.cfg missing some information (e.g. for the recipes_directories).

amisk commented 5 years ago

Oh indeed. the pythonpath in the pipeline pointed at lofarroot/lib64. This was always present, but not now... (because for some reason I forgot to install python-casacore. Stupid)

Ok, next one. Sorry, for the questions which by now are unrelated to the first one :) Instead of the "offline" installation I used yours. But now NDPPP is missing. I installed DP3 which creates DPPP. I guess its the same? Can I just symlink it?

It seems that prefactor is hardcoded to use NDPPP.

tikk3r commented 5 years ago

No worries, I know the pain :P Yeah the pipeline is hardcoded to search for lofar/bin/NDPPP as far as a I know, so like you say I just symlinked it to the DPPP executable that DP3 gives.

amisk commented 5 years ago

Ok. It runs. Althoug I get this message

2019-03-05 10:47:47 WARNING node.slurm17.executable_args.L643651_SB014_uv.MS: /opt/soft/lofar-stuff/bin/NDPPP stderr:
*** WARNING: the following parset keywords were not used ***
             maybe they are misspelled
    [aoflag.keepstatistics,aoflag.memoryperc,aoflag.type,demix.demixfreqstep,demix.demixtimestep,demix.freqstep,demix.ignoretarget,demix.instrumentmodel,demix.ntimechunk,demix.skymodel,demix.subtractsources,demix.targetsource,demix.timest
ep,demix.type,flagedge.chan,flagedge.type]

I guess that was something that NDPPP had but is no longer in DPPP.

tikk3r commented 5 years ago

That's normal. It's just warning you that those steps are not in the steps parameter. Those options are enabled/disabled depending on what's given near the top of the parset:

https://github.com/lofar-astron/prefactor/blob/07cd7ddefa6af296fe1cb7407cfc7eca126fd66d/Pre-Facet-Calibrator.parset#L48-L58

since demix_step is turned off and initial_flagging is set to default_flagging, the aoflag, flagedge and demix sub-steps are skipped.

amisk commented 5 years ago

Thanks. Again forgot to change the open file limit .. but now it appears to be still running :)

If this is working for the calibrator and the target later, I assume that the installation succeed and I could happily provide an install script for the latest software for Ubuntu based systems, in case somebody is interested.

adrabent commented 5 years ago

@amisk This would be pretty helpul. We could provide such an installation instruction in the prefactor documentation.