lcfiplus / LCFIPlus

Flavor tagging code for ILC detectors
https://confluence.slac.stanford.edu/display/ilc/LCFIPlus
GNU General Public License v3.0
6 stars 19 forks source link

LCFIPlus crashes in miniDST workflow for some inputs #59

Closed tmadlener closed 2 years ago

tmadlener commented 3 years ago

In the context of the miniDST workflow we have encountered some issues when the primary vertex collection is empty: iLCSoft/MarlinReco#93. We have a fix available for the underlying problem there (see iLCSoft/MarlinReco#94). However, with that fix in place we run into problems with LCFIPlus, which is run directly after the isolated lepton finding/tagging in the miniDST workflow.

We get a segmentation fault and the following stacktrace:

 *** Break *** segmentation violation

===========================================================
There was a crash.
This is the entire stack trace of all threads:
===========================================================
#0  0x00007f68e34644fc in waitpid () from /lib64/libc.so.6
#1  0x00007f68e33e1f62 in do_system () from /lib64/libc.so.6
#2  0x00007f68e2baddd3 in TUnixSystem::Exec (shellcmd=<optimized out>, this=0x1d4a570) at /tmp/madlener/spack-stage/spack-stage-root-6.24.00-u7xb6dwwdgx7v3xbdg37tsfvdufauaef/spack-src/core/unix/src/TUnixSystem.cxx:2120
#3  TUnixSystem::StackTrace (this=0x1d4a570) at /tmp/madlener/spack-stage/spack-stage-root-6.24.00-u7xb6dwwdgx7v3xbdg37tsfvdufauaef/spack-src/core/unix/src/TUnixSystem.cxx:2411
#4  0x00007f68e2bb06f5 in TUnixSystem::DispatchSignals (this=0x1d4a570, sig=kSigSegmentationViolation) at /tmp/madlener/spack-stage/spack-stage-root-6.24.00-u7xb6dwwdgx7v3xbdg37tsfvdufauaef/spack-src/core/unix/src/TUnixSystem.cxx:3649
#5  <signal handler called>
#6  0x00007f68bc717e4a in lcfiplus::algoEtc::SimpleSecMuonFinder(lcfiplus::Track const*, double, double, double, double, double, double, double, double, double, lcfiplus::Vertex const*) () from /nfs/dust/ilc/group/cvmfs_ilc/sw/key4hep/install/spackages_nfs_root/lcfiplus/0.10/x86_64-centos7-gcc8.3.0-opt/b7t3p7hze6ek467ftef3whqcagxvrnuy/lib/libLCFIPlus.so
#7  0x00007f68bc6be1da in lcfiplus::JetFinder::prerun(std::vector<lcfiplus::Track const*, std::allocator<lcfiplus::Track const*> > const&, std::vector<lcfiplus::Neutral const*, std::allocator<lcfiplus::Neutral const*> > const&, std::vector<lcfiplus::Vertex const*, std::allocator<lcfiplus::Vertex const*> > const&, int*) () from /nfs/dust/ilc/group/cvmfs_ilc/sw/key4hep/install/spackages_nfs_root/lcfiplus/0.10/x86_64-centos7-gcc8.3.0-opt/b7t3p7hze6ek467ftef3whqcagxvrnuy/lib/libLCFIPlus.so
#8  0x00007f68bc73bb19 in lcfiplus::JetClustering::process() () from /nfs/dust/ilc/group/cvmfs_ilc/sw/key4hep/install/spackages_nfs_root/lcfiplus/0.10/x86_64-centos7-gcc8.3.0-opt/b7t3p7hze6ek467ftef3whqcagxvrnuy/lib/libLCFIPlus.so
#9  0x00007f68bc6d6ae2 in LcfiplusProcessor::processEvent(EVENT::LCEvent*) () from /nfs/dust/ilc/group/cvmfs_ilc/sw/key4hep/install/spackages_nfs_root/lcfiplus/0.10/x86_64-centos7-gcc8.3.0-opt/b7t3p7hze6ek467ftef3whqcagxvrnuy/lib/libLCFIPlus.so
#10 0x00007f68e56c03c1 in marlin::ProcessorMgr::processEvent (this=0x1dfa140, evt=0x17571d00) at /tmp/madlener/spack-stage/spack-stage-marlin-1.17.1-4rg3d777ujzvtjwpmluuhg4lgc6n2ik6/spack-src/source/src/ProcessorMgr.cc:494
#11 0x00007f68e4ffefa1 in SIO::SIOReader::processEvent(std::shared_ptr<EVENT::LCEvent>) () from /nfs/dust/ilc/group/cvmfs_ilc/sw/key4hep/install/spackages_nfs_root/lcio/2.16.1/x86_64-centos7-gcc8.3.0-opt/nqt55lk5fmwwer334skcjgjcihnbghmr/lib/liblcio.so.2.16
#12 0x00007f68e500cf48 in MT::LCReader::readStream(std::unordered_set<MT::LCReaderListener*, std::hash<MT::LCReaderListener*>, std::equal_to<MT::LCReaderListener*>, std::allocator<MT::LCReaderListener*> > const&, int) () from /nfs/dust/ilc/group/cvmfs_ilc/sw/key4hep/install/spackages_nfs_root/lcio/2.16.1/x86_64-centos7-gcc8.3.0-opt/nqt55lk5fmwwer334skcjgjcihnbghmr/lib/liblcio.so.2.16
#13 0x00007f68e500d747 in MT::LCReader::readStream(MT::LCReaderListener*, int) () from /nfs/dust/ilc/group/cvmfs_ilc/sw/key4hep/install/spackages_nfs_root/lcio/2.16.1/x86_64-centos7-gcc8.3.0-opt/nqt55lk5fmwwer334skcjgjcihnbghmr/lib/liblcio.so.2.16
#14 0x000000000040ae37 in main (argc=<optimized out>, argv=<optimized out>) at /tmp/madlener/spack-stage/spack-stage-marlin-1.17.1-4rg3d777ujzvtjwpmluuhg4lgc6n2ik6/spack-src/source/src/Marlin.cc:467
===========================================================

I had a quick look into the SimpleSecMuonFinder and the JetFinder::prerun, but I couldn't find anything obvious there. Could you have a look and see what the problem is in this case? We can also provide the necessary inputs to reproduce the above crash.

NOTE: The stacktrace above does not use a central ilcsoft installation, but we get a segmentation violation also if we use the v02-02-02 release (even if we do not get a stacktrace with that).

ryonamin commented 3 years ago

@tmadlener Could you let us know what sample to be used to reproduce the problem? It would be helpful to identify the cause. Thank you in advance.

tmadlener commented 3 years ago

Apologies for the delay. I have put an input DST file onto my public afs directory:

/afs/desy.de/user/m/madlener/public/LCFIPlus/ddsim_out_53520300_15_DST.slcio

This fails for us with the MarlinStdRecoMiniDST.xml workflow from the master branch of ILDConfig. Note, however, that this workflow currently crashes earlier if the changes from iLCSoft/MarlinReco#94 are not applied. It is possible to run a slightly adapted workflow to "circumvent" the earlier problems, by only leaving the InitDD4hep and the first LCFIPlus processors in the execute section (JC2FT and EF2). In this case also the input collection for the JC2FT processor has to be changed to PandoraPFOs (see diff below).

Let me know if there are any problems in accessing the files or with this brief description of how to reproduce the problem.

diff --git a/StandardConfig/production/MarlinStdRecoMiniDST.xml b/StandardConfig/production/MarlinStdRecoMiniDST.xml
index a5098f2..32fc8c9 100644
--- a/StandardConfig/production/MarlinStdRecoMiniDST.xml
+++ b/StandardConfig/production/MarlinStdRecoMiniDST.xml
@@ -4,23 +4,23 @@
   <processor name="InitDD4hep"/>
   <!--processor name="FastJetOverlay"/>
   <processor name="ExpandJet"/-->
-  <processor name="Thrust"/>
+  <!--processor name="Thrust"/>
   <processor name="Sphere"/>
   <processor name="Fox"/>
   <processor name="IsolatedMuonTagging"/>
   <processor name="IsolatedElectronTagging"/>
   <processor name="IsolatedTauTagging"/>
-  <processor name="IsolatedPhotonTagging"/>
+  <processor name="IsolatedPhotonTagging"/-->
   <processor name="JC2FT"/>
   <processor name="EF2"/>
-  <processor name="JC3FT"/>
+  <!--processor name="JC3FT"/>
   <processor name="EF3"/>
   <processor name="JC4FT"/>
   <processor name="EF4"/>
   <processor name="JC5FT"/>
   <processor name="EF5"/>
   <processor name="JC6FT"/>
-  <processor name="EF6"/>
+  <processor name="EF6"/-->
   <if condition="${RundEdxCorrections}">
     <processor name="ComputeCorrectAngulardEdX"/>
     <processor name="LikelihoodPID"/>
@@ -255,7 +255,7 @@
   <parameter name="Algorithms" type="stringVec"> JetClustering JetVertexRefiner FlavorTag ReadMVA</parameter>

   <!-- general parameters -->
-  <parameter name="PFOCollection" type="string" value="PFOsminusphoton" /> <!-- input PFO collection -->
+  <parameter name="PFOCollection" type="string" value="PandoraPFOs" /> <!-- input PFO collection -->
   <parameter name="UseMCP" type="int" value="0" /> <!-- MC info not used -->
   <parameter name="MCPCollection" type="string" value="" /> <!-- not used -->
   <parameter name="MCPFORelation" type="string" value="" /> <!-- not used -->
ryonamin commented 3 years ago

@tmadlener Thank you for your input. I was able to reproduce the problem. I found that LCFIPlus tried to access the null pointer for primary vertex when no primary vertex was found. So I have made a pull request to make a "practical" or "virtual" vertex that is defined as (0.,0.,0.) with error (0.,0., beam spot size in z) in such special cases.