iLCSoft / MarlinReco

GNU General Public License v3.0
4 stars 37 forks source link

RecoMCTruthLinker crashes #125

Open ggrenier opened 9 months ago

ggrenier commented 9 months ago

Running on a machine with OS version : "CentOS Linux release 7.9.2009 (Core)"

Using cvmfs build : source /cvmfs/ilc.desy.de/sw/x86_64_gcc103_centos7/v02-03-02/init_ilcsoft.sh

Using ILDConfig version v02-03-02.

Running Marlin in directory ILDConfig-02-03-02/StandardConfig/production/ with the command : Marlin MarlinStdReco.xml --constant.lcgeo_DIR=${lcgeo_DIR} --constant.DetectorModel=ILD_l2_v02 --constant.CMSEnergy=250 --global.LCIOInputFiles=/scratch/ddsim_E1-calib.Puds91.Gsgreen.e0.p0.I110048.01.slcio --global.MaxRecordNumber=10

Input file has been copied from the dirac grid. Location on dirac grid : /ilc/user/g/ggrenier/prod/v02-02-03/uds/sim/ILD_l2_v02/ddsim_E1-calib.Puds91.Gsgreen.e0.p0.I110048.01.slcio

Result crash in RecoMCTruthLinker : program output ends with :

 [ MESSAGE0 "MyRecoMCTruthLinker"]  processEvent 0  - 0

 *** Break *** segmentation violation

===========================================================
There was a crash.
This is the entire stack trace of all threads:
===========================================================
#0  0x00007f87330ae60c in waitpid () from /usr/lib64/libc.so.6
#1  0x00007f873302bf62 in do_system () from /usr/lib64/libc.so.6
#2  0x00007f8731b6fe3b in Exec (shellcmd=<optimized out>, this=0x6469f0) at /cvmfs/ilc.desy.de/sw/x86_64_gcc103_centos7/root/6.28.04/core/unix/src/TUnixSystem.cxx:2104
#3  TUnixSystem::StackTrace() () at /cvmfs/ilc.desy.de/sw/x86_64_gcc103_centos7/root/6.28.04/core/unix/src/TUnixSystem.cxx:2395
#4  0x00007f8731b6d575 in TUnixSystem::DispatchSignals (this=0x6469f0, sig=kSigSegmentationViolation) at /cvmfs/ilc.desy.de/sw/x86_64_gcc103_centos7/root/6.28.04/core/unix/src/TUnixSystem.cxx:3615
#5  <signal handler called>
#6  RecoMCTruthLinker::clusterLinker(EVENT::LCEvent*, EVENT::LCCollection*, EVENT::LCCollection*, EVENT::LCCollection**, EVENT::LCCollection**, EVENT::LCCollection**) () at /cvmfs/ilc.desy.de/sw/x86_64_gcc103_centos7/v02-03-02/MarlinReco/v01-34/Analysis/RecoMCTruthLink/src/RecoMCTruthLinker.cc:1282
#7  0x00007f870d985562 in RecoMCTruthLinker::processEvent(EVENT::LCEvent*) () at /cvmfs/ilc.desy.de/sw/x86_64_gcc103_centos7/v02-03-02/MarlinReco/v01-34/Analysis/RecoMCTruthLink/src/RecoMCTruthLinker.cc:383
#8  0x00007f8733faddd6 in marlin::ProcessorMgr::processEvent(EVENT::LCEvent*) () at /cvmfs/ilc.desy.de/sw/x86_64_gcc103_centos7/v02-03-02/Marlin/v01-19/source/src/ProcessorMgr.cc:494
#9  0x00007f8733ee4371 in SIO::SIOReader::processEvent (this=0x354e840, event=...) at /cvmfs/ilc.desy.de/sw/x86_64_gcc103_centos7/v02-03-02/lcio/v02-20/src/cpp/src/SIO/SIOReader.cc:204
#10 0x00007f8733eeeee3 in operator() (recdata=..., recinfo=..., __closure=<synthetic pointer>) at /cvmfs/sft.cern.ch/lcg/releases/gcc/10.3.0-f5826/x86_64-centos7/include/c++/10.3.0/ext/atomicity.h:100
#11 read_records<MT::LCReader::readStream(const LCReaderListenerList&, int)::<lambda(const sio::record_info&)>, MT::LCReader::readStream(const LCReaderListenerList&, int)::<lambda(const sio::record_info&, const sio::buffer_span&)> > (func=..., valid=..., outbuf=..., stream=...) at /cvmfs/ilc.desy.de/sw/x86_64_gcc103_centos7/sio/v00-01/include/sio/api.h:419
#12 MT::LCReader::readStream(std::unordered_set<MT::LCReaderListener*, std::hash<MT::LCReaderListener*>, std::equal_to<MT::LCReaderListener*>, std::allocator<MT::LCReaderListener*> > const&, int) [clone .localalias] () at /cvmfs/ilc.desy.de/sw/x86_64_gcc103_centos7/v02-03-02/lcio/v02-20/src/cpp/src/MT/LCReader.cc:557
#13 0x00007f8733eef8a0 in MT::LCReader::readStream(MT::LCReaderListener*, int) () at /cvmfs/sft.cern.ch/lcg/releases/gcc/10.3.0-f5826/x86_64-centos7/include/c++/10.3.0/initializer_list:79
#14 0x000000000040878f in main () at /cvmfs/ilc.desy.de/sw/x86_64_gcc103_centos7/v02-03-02/Marlin/v01-19/source/src/Marlin.cc:458
#15 0x00007f873300b555 in __libc_start_main () from /usr/lib64/libc.so.6
#16 0x0000000000408fdf in _start () at /cvmfs/sft.cern.ch/lcg/releases/gcc/10.3.0-f5826/x86_64-centos7/include/c++/10.3.0/bits/basic_string.tcc:206
===========================================================
tmadlener commented 9 months ago

Hi @ggrenier, thanks for the report. Would it be possible for you to run Marlin again with a slightly higher verbosity (at least for the reco mc truth linker? (I.e. set the Verbosity steering parameter to DEBUG in the steering file). Just from looking at the code it is not entirely clear to me how this would crash at the point it does, because there seem to be checks to avoid that in principle. However, there should be a print out in that case and I am currently not sure whether that is simply missing because of a too low output level, or because the code does not do what I think it does.

This is where the crash happens (line 1282): https://github.com/iLCSoft/MarlinReco/blob/c5c492c1ab66a18dad61a8df54c76c9110f7360b/Analysis/RecoMCTruthLink/src/RecoMCTruthLinker.cc#L1276-L1286

ggrenier commented 9 months ago

Hi @tmadlener Running with adding option --MyRecoMCTruthLinker.Verbosity=DEBUG gives the output marlin_debug.log

tmadlener commented 9 months ago

Thanks. It looks a bit like some of the internal mapping goes wrong, but it is hard to say where just from the outputs. Do all the input clusters and hits have their mc links set properly? Resp. is it possible that one of these input LCRelation collections is missing from the configuration?

I will try to have a look at this with a debugger, but that will be after the christmas break in 2024.