lofar-astron / factor

Facet calibration for LOFAR
http://www.astron.nl/citt/facet-doc
GNU General Public License v2.0
19 stars 12 forks source link

Crash with KeyboardInterrupt/No Python class registered #221

Closed tikk3r closed 6 years ago

tikk3r commented 6 years ago

I am trying to run FACTOR on a dataset (containing international stations, if that's important). However, it crashes with the error

No Python class registered for C++ class LOFAR::PyParameterSet

I've put the log and terminal output here: https://gist.github.com/tikk3r/d1bbc5579f7ec7e4aea596265179f7ec

What could be the problem?

tammojan commented 6 years ago

Looks like a problem with the boost-python binding to the lofar parameterset (which is C++ code).

Does the following reproduce the problem?

from lofar.parameterset import parameterset
p = parameterset({"somekey": 3, "someotherkey": ["a","b","c"]})
p.getStringVector("someotherkey")
rvweeren commented 6 years ago

Frits, check that your LD_LIBRARY_PATH points to:

/net/lofar1/data1/software/boost_1_63_0/lib

On 30 Aug 2018, at 13:20, Tammo Jan Dijkema notifications@github.com wrote:

Looks like a problem with the boost-python binding to the lofar parameterset (which is C++ code).

Does the following reproduce the problem?

from lofar.parameterset import parameterset p = parameterset({"somekey": 3, "someotherkey": ["a","b","c"]}) p.getStringVector("someotherkey") — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/lofar-astron/factor/issues/221#issuecomment-417284328, or mute the thread https://github.com/notifications/unsubscribe-auth/ADsyMWjdbXSSXC6Du7A-JlJVyb7VdgHlks5uV8ptgaJpZM4WTPON.

tikk3r commented 6 years ago

@tammojan that piece of code works fine:

In [1]: from lofar.parameterset import parameterset

In [2]: p = parameterset({"somekey":3, "someotherkey":["a","b","c"]})

In [3]: p.getStringVector("someotherkey")
Out[3]: ['a', 'b', 'c']

@rvweeren that directory is in the LD_LIBRARY_PATH

tammojan commented 6 years ago

I had a look at this, no conclusion yet.

I suspect this is due to a wrong linking with boost_python. I tried in @tikk3r's environment, altered cuisine/WSRTrecipe.py:78 to report a stacktrace:

/net/para11/data2/sweijen/software/env-losoto2-para/bin/python
2018-08-31 14:22:51 DEBUG   facetselfcal_facet_patch_374: Pipeline start time: 2018-08-31T12:22:51
End test in baserecipe
2018-08-31 14:22:51 INFO    facetselfcal_facet_patch_374: LOFAR Pipeline (facetselfcal_facet_patch_374) starting.
2018-08-31 14:22:51 INFO    facetselfcal_facet_patch_374: SASID = , MOMID = , Feedback method = None
NYI: validate_steps
Handling exception here
2018-08-31 14:22:52 ERROR   facetselfcal_facet_patch_374: Exception caught: No Python class registered for C++ class LOFAR::PyParameterSet
Traceback (most recent call last):
  File "/home/dijkema/opt/lofar/build/gnucxx11_opt/installed/lib64/python2.7/site-packages/lofarpipe/cuisine/WSRTrecipe.py", line 143, in run
    status = self.go()
  File "/home/dijkema/opt/lofar/build/gnucxx11_opt/installed/bin/genericpipeline.py", line 97, in go
    return super(GenericPipeline, self).go()
  File "/home/dijkema/opt/lofar/build/gnucxx11_opt/installed/lib64/python2.7/site-packages/lofarpipe/support/control.py", line 155, in go
    self.pipeline_logic()
  File "/home/dijkema/opt/lofar/build/gnucxx11_opt/installed/bin/genericpipeline.py", line 141, in pipeline_logic
    self.parset.fullModuleName('pipeline') + '.')
  File "/home/dijkema/opt/lofar/build/gnucxx11_opt/installed/lib64/python2.7/site-packages/lofar/parameterset/__init__.py", line 121, in makeSubset
    ps = self._makeSubset (baseKey, prefix)
TypeError: No Python class registered for C++ class LOFAR::PyParameterSet
  File "/home/dijkema/opt/lofar/build/gnucxx11_opt/installed/lib64/python2.7/site-packages/lofarpipe/cuisine/WSRTrecipe.py", line 143, in run
    status = self.go()
tammojan commented 6 years ago

Your $LD_LIBRARY_PATH contains /net/lofar1/data1/software/boost_1_63_0/lib (somehow). Could you get rid of that?

tikk3r commented 6 years ago

Then there is not Boost library to use, resulting in

ImportError: libboost_python.so.1.63.0: cannot open shared object file: No such file or directory

Could it be the Boost version? I see the current release version is now 1.68.

I no longer get a crash now however, and a different error message. Is this because of what you tried?

INFO - factor:directions - Reading directions file: /net/para11/data2/sweijen/LOFAR_VLBI/4C43.15_factor/factor_output/factor_directions.txt
INFO - factor - Self calibrating 1 direction(s) in Group 1
QPID support NOT enabled! Will NOT connect to any broker, and messages will be lost!
INFO - factor:scheduler - <-- Operation facetselfcal started (direction: facet_patch_374)
INFO - factor:scheduler - locals(): {'pipeline': <gp.GenericPipeline object at 0x7fd2efc4d710>, 'direction_name': 'facet_patch_374', 'getSearchingLogger': <function getSearchingLogger at 0x7fd2749a3de8>, 'gp': <module 'gp' from '/net/lofar1/data1/rvweeren/software/lofar_aug15_2018/lofar/bin/genericpipeline.pyc'>, 'RedirectStdStreams': <class 'factor.lib.context.RedirectStdStreams'>, 'genericpipeline_path': '/net/lofar1/data1/rvweeren/software/lofar_aug15_2018/lofar/bin', 'parset': '/net/para11/data2/sweijen/LOFAR_VLBI/4C43.15_factor/factor_output/results/facetselfcal/facet_patch_374/pipeline.parset', 'loader': <module 'loader' from '/net/lofar1/data1/rvweeren/software/lofar_aug15_2018/lofar/bin/loader.pyc'>, 'op_name': 'facetselfcal', 'genericpipeline_executable': '/net/lofar1/data1/rvweeren/software/lofar_aug15_2018/lofar/bin/genericpipeline.py', 'time': <module 'time' from '/net/para11/data2/sweijen/software/env-losoto2-para/lib64/python2.7/lib-dynload/timemodule.so'>, 'logbasename': '/net/para11/data2/sweijen/LOFAR_VLBI/4C43.15_factor/factor_output/logs/facetselfcal/facet_patch_374', 'config': '/net/para11/data2/sweijen/LOFAR_VLBI/4C43.15_factor/factor_output/results/facetselfcal/facet_patch_374/pipeline.cfg'}
WARNING - factor - Self calibration failed for direction facet_patch_374.
INFO - factor - Exiting...
tikk3r commented 6 years ago

@tammojan Somehow it seems to work with an older version of the lofar software installed on the Leiden clusters. It has been modified somehow, however, but I don't (yet) know exactly what has been done to it to make it work.

Could it be some sort of feature that was added/removed in either the software or the genericpipeline framework? Or that some specific, odd linking with libraries is needed?

If you are interested, the install that works is located at /net/lofar1/data1/oonk/rh7_lof_feb2017_2_19_0_ER/lofim.sh and the necessary stuff can be sourced from /net/para11/data2/sweijen/LOFAR_VLBI/4C43.15_factor/source_test.sh.

tikk3r commented 6 years ago

It looks like this problem was related to the Boost library after all. The "special" version mentioned above was linked against Boost 1.63.0, whereas the other version(s) was/were in fact not it turned out. Using that, it finally worked. I couldn't manage to compile it with the newest version (1.68.0), so I guess it's rather particular about this.