Open yradkhorrami opened 3 years ago
I have tried it with my local analysis process and with MyRefitProcessorProton
process from ILDConfig production.
And I couldn't reproduce the Seg. fault message..
Could you share the processor code which could reproduce this?
I think I had this behavior before, although I don't remember how did I fix that exactly... My guess would be that it is something with Process destructor.. Seeing the code would help
I'm using just IsolatedLeptonTaggingProcessor centrally installed on cvmfs the steering file is attached. (just rename .xml.txt ->.xml) SLDCorrection.xml.txt
Very interesting..
I have tried on naf:
with:
source /cvmfs/ilc.desy.de/sw/x86_64_gcc82_centos7/v02-02-02/init_ilcsoft.sh
then
Marlin ./SLDCorrection.xml
outputs no Seg.Fault. in the end..
MESSAGE "MyIsolatedLeptonTaggingProcessor"] -------------------------------------------------
[ MESSAGE "Marlin"] ---------------------------------------------------------
[ MESSAGE "Marlin"] Events skipped by processors :
[ MESSAGE "Marlin"] Total: 0
[ MESSAGE "Marlin"] ---------------------------------------------------------
[ MESSAGE "Marlin"]
[ MESSAGE "Marlin"] ---------------------------------------------------------
[ MESSAGE "Marlin"] Time used by processors ( in processEvent() ) :
[ MESSAGE "Marlin"]
[ MESSAGE "Marlin"] MyIsolatedLeptonTaggingProcess 7.000000e-01 s in 998 events ==> 7.014028e-04 [ s/evt.]
[ MESSAGE "Marlin"] Total: 7.000000e-01 s in 998 events ==> 7.014028e-04 [ s/evt.]
[ MESSAGE "Marlin"] ---------------------------------------------------------
I could reproduce the problem by adding my custom /afs/desy.de/user/d/dudarboh/iLCSoft/MarlinUtil/lib/libMarlinUtilNew.so
to the $MARLIN_DLL
. Then, Seg. Fault appears in the end as described above.
@yradkhorrami could you share your output of echo $MARLIN_DLL
to check if it has any potential processor/library duplicates?
My guess would be that this happens when marlin::Processor::~Processor()
tries to clean up Processor parameters here
Although I am a bit puzzled, as my libMarlinUtilNew.so
is not really a processor at all and I renamed the library...
Here is relevant part of valgrind output:
. . .
==16765== Invalid read of size 8
==16765== at 0x4E8D8F0: marlin::Processor::~Processor() (in /cvmfs/ilc.desy.de/sw/x86_64_gcc82_centos7/v02-02-02/Marlin/v01-17-01/lib/libMarlin.so.1.17.1)
==16765== by 0x7216CE8: __run_exit_handlers (in /usr/lib64/libc-2.17.so)
==16765== by 0x7216D36: exit (in /usr/lib64/libc-2.17.so)
==16765== by 0x71FF55B: (below main) (in /usr/lib64/libc-2.17.so)
==16765== Address 0x2bb250d0 is 1,680 bytes inside an unallocated block of size 1,696 in arena "client"
==16765==
==16765== Invalid read of size 8
==16765== at 0x4E8D8F9: marlin::Processor::~Processor() (in /cvmfs/ilc.desy.de/sw/x86_64_gcc82_centos7/v02-02-02/Marlin/v01-17-01/lib/libMarlin.so.1.17.1)
==16765== by 0x7216CE8: __run_exit_handlers (in /usr/lib64/libc-2.17.so)
==16765== by 0x7216D36: exit (in /usr/lib64/libc-2.17.so)
==16765== by 0x71FF55B: (below main) (in /usr/lib64/libc-2.17.so)
==16765== Address 0x2bb24fd0 is 1,424 bytes inside an unallocated block of size 1,696 in arena "client"
==16765==
==16765== Jump to the invalid address stated on the next line
==16765== at 0x0: ???
==16765== by 0x4E8D8FE: marlin::Processor::~Processor() (in /cvmfs/ilc.desy.de/sw/x86_64_gcc82_centos7/v02-02-02/Marlin/v01-17-01/lib/libMarlin.so.1.17.1)
==16765== by 0x7216CE8: __run_exit_handlers (in /usr/lib64/libc-2.17.so)
==16765== by 0x7216D36: exit (in /usr/lib64/libc-2.17.so)
==16765== by 0x71FF55B: (below main) (in /usr/lib64/libc-2.17.so)
==16765== Address 0x0 is not stack'd, malloc'd or (recently) free'd
==16765==
==16765==
==16765== Process terminating with default action of signal 11 (SIGSEGV)
==16765== Bad permissions for mapped region at address 0x0
==16765== at 0x0: ???
==16765== by 0x4E8D8FE: marlin::Processor::~Processor() (in /cvmfs/ilc.desy.de/sw/x86_64_gcc82_centos7/v02-02-02/Marlin/v01-17-01/lib/libMarlin.so.1.17.1)
==16765== by 0x7216CE8: __run_exit_handlers (in /usr/lib64/libc-2.17.so)
==16765== by 0x7216D36: exit (in /usr/lib64/libc-2.17.so)
==16765== by 0x71FF55B: (below main) (in /usr/lib64/libc-2.17.so)
==16765==
. . .
Maybe running it with debug symbols can give more info, although I would need to manually rebuild Marlin from scratch then..
Maybe @tmadlener, @gaede have a better explanation and potential fix in mind?
@dudarboh, I just looked at MARLIN libraries and found an interesting point: before including a local Marlin library, there is no problem, and the Marlin job finishes without any Seg. Fault. As soon as I add some of my local Marlin library, the Seg. Fault appears at the end. I checked which libraries cause the issue and found out those had been compiled using previous versions of ILCSoft (gcc,...) cause the issue. after recompiling the same processor with the latest version, the Seg.Faul does not appear. the output of echo $MARLIN_DLL
is:
/cvmfs/ilc.desy.de/sw/x86_64_gcc82_centos7/v02-02-02/MarlinDD4hep/v00-06/lib/libMarlinDD4hep.so:/cvmfs/ilc.desy.de/sw/x86_64_gcc82_centos7/v02-02-02/DDMarlinPandora/v00-11/lib/libDDMarlinPandora.so:/cvmfs/ilc.desy.de/sw/x86_64_gcc82_centos7/v02-02-02/MarlinReco/v01-31/lib/libMarlinReco.so:/cvmfs/ilc.desy.de/sw/x86_64_gcc82_centos7/v02-02-02/PandoraAnalysis/v02-00-01/lib/libPandoraAnalysis.so:/cvmfs/ilc.desy.de/sw/x86_64_gcc82_centos7/v02-02-02/LCFIVertex/v00-08/lib/libLCFIVertexProcessors.so:/cvmfs/ilc.desy.de/sw/x86_64_gcc82_centos7/v02-02-02/CEDViewer/v01-17-01/lib/libCEDViewer.so:/cvmfs/ilc.desy.de/sw/x86_64_gcc82_centos7/v02-02-02/Overlay/v00-22-02/lib/libOverlay.so:/cvmfs/ilc.desy.de/sw/x86_64_gcc82_centos7/v02-02-02/MarlinFastJet/v00-05-02/lib/libMarlinFastJet.so:/cvmfs/ilc.desy.de/sw/x86_64_gcc82_centos7/v02-02-02/LCTuple/v01-12/lib/libLCTuple.so:/cvmfs/ilc.desy.de/sw/x86_64_gcc82_centos7/v02-02-02/MarlinKinfit/v00-06/lib/libMarlinKinfit.so:/cvmfs/ilc.desy.de/sw/x86_64_gcc82_centos7/v02-02-02/MarlinTrkProcessors/v02-11/lib/libMarlinTrkProcessors.so:/cvmfs/ilc.desy.de/sw/x86_64_gcc82_centos7/v02-02-02/MarlinKinfitProcessors/v00-04-02/lib/libMarlinKinfitProcessors.so:/cvmfs/ilc.desy.de/sw/x86_64_gcc82_centos7/v02-02-02/ILDPerformance/v01-10/lib/libILDPerformance.so:/cvmfs/ilc.desy.de/sw/x86_64_gcc82_centos7/v02-02-02/Clupatra/v01-03/lib/libClupatra.so:/cvmfs/ilc.desy.de/sw/x86_64_gcc82_centos7/v02-02-02/Physsim/v00-04-01/lib/libPhyssim.so:/cvmfs/ilc.desy.de/sw/x86_64_gcc82_centos7/v02-02-02/LCFIPlus/v00-09/lib/libLCFIPlus.so:/cvmfs/ilc.desy.de/sw/x86_64_gcc82_centos7/v02-02-02/FCalClusterer/v01-00-01/lib/libFCalClusterer.so:/cvmfs/ilc.desy.de/sw/x86_64_gcc82_centos7/v02-02-02/ForwardTracking/v01-14/lib/libForwardTracking.so:/cvmfs/ilc.desy.de/sw/x86_64_gcc82_centos7/v02-02-02/ConformalTracking/v01-10/lib/libConformalTracking.so:/cvmfs/ilc.desy.de/sw/x86_64_gcc82_centos7/v02-02-02/LICH/v00-01/lib/libLICH.so:/cvmfs/ilc.desy.de/sw/x86_64_gcc82_centos7/v02-02-02/Garlic/v03-01/lib/libGarlic.so:/afs/desy.de/group/flc/pool/radkhory/HdecayMode/lib/libHdecayMode.so:/afs/desy.de/group/flc/pool/radkhory/SLDecayCorrection/lib/libSLDecayCorrection.so
which HdecayMode
caused the issue.
As Julie @Torndal recently also encountered this problem. I want to throw my 5 cents again.
Basically, I want to confirm @yradkhorrami observations from the previous post.
I encountered this seg. fault in the end, only with libraries inside MARLIN_DLL
which were compiled with a previous versions of iLCSoft.
I was trying to debug it with gdb
a bit, thanks to @tmadlener, but it really went far beyond return 0;
in the main()
and crashed somewhere on std::string()
destructor...
Recompiling the processor with a consistent version with all other libraries from iLCSoft, I think should fix it
........ [ MESSAGE "Marlin"] --------------------------------------------------------- [ MESSAGE "Marlin"] Events skipped by processors : [ MESSAGE "Marlin"] Total: 0 [ MESSAGE "Marlin"] --------------------------------------------------------- [ MESSAGE "Marlin"] [ MESSAGE "Marlin"] --------------------------------------------------------- [ MESSAGE "Marlin"] Time used by processors ( in processEvent() ) :
[ MESSAGE "Marlin"] [ MESSAGE "Marlin"] MyIsolatedLeptonTaggingProcess 8.300000e-01 s in 998 events ==> 8.316633e-04 [ s/evt.] [ MESSAGE "Marlin"] Total: 8.300000e-01 s in 998 events ==> 8.316633e-04 [ s/evt.] [ MESSAGE "Marlin"] --------------------------------------------------------- Segmentation fault