star-bnl / star-sw

Core software for STAR experiment
26 stars 63 forks source link

Memory leak in StFstClusterMaker #590

Closed genevb closed 7 months ago

genevb commented 7 months ago

Processing 500 Run 22 pp500 events in valgrind without TPC, just the FWD detector makers, I get 2 definite memory leaks and 2 possible memory leaks, shown below.

Here is the chain I ran. This took ~1 hour outside valgrind and ~2 hours inside valgrind. Fewer events would probably be fine to show the problems if you want to run it faster.

setup 64b
stardev
nohup valgrind --leak-check=full root4star -b -q -l 'bfc.C(500,"DbV20230818 pp2022a -btof -mtd -eventqa -emcDY2 -TpcHitMover -ittf -tpx -tpc Notpc_daq fst ftt fstRawHit -picoEvt -picoDst -picoWrite -picoVtxDefault -picoCovMtxWrite -trgSimu BEmcChkStat evout -hitfilt","/star/data03/daq/2022/036/23036001/FTEST/st_physics_23036001_raw_5000040.daq")' >& log &

Thanks -Gene

==26363== 2,621,528 bytes in 46,813 blocks are definitely lost in loss record 278,294 of 278,315 ==26363== at 0x76AD203: operator new(unsigned long) (vg_replace_malloc.c:334) ==26363== by 0x7E01968: TStorage::ObjectAlloc(unsigned long) (TStorage.cxx:325) ==26363== by 0x297D875C: operator new (TObject.h:156) ==26363== by 0x297D875C: StFstScanRadiusClusterAlgo::doClustering(StFstCollection const&, StFstRawHitCollection&, StFstClusterCollection&) (StFstScanRadiusClusterAlgo.cxx:55) ==26363== by 0x297D840F: StFstIClusterAlgo::doClustering(StFstCollection&) (StFstIClusterAlgo.cxx:25) ==26363== by 0x297D7AF3: StFstClusterMaker::Make() (StFstClusterMaker.cxx:59) ==26363== by 0x14F1309E: StMaker::Make() (StMaker.cxx:937) ==26363== by 0x14F1309E: StMaker::Make() (StMaker.cxx:937) ==26363== by 0x14F0E4E7: StChain::EventLoop(int, int, StMaker) (StChain.cxx:206) ==26363== by 0x14F28988: GStChain_Cint_552_0_9(Gvalue, char const, G__param, int) (in /afs/rhic.bnl.gov/star/packages/release32/gitdev/.sl73_x8664_gcc485/OBJ/StRoot/StChain/StChain.so) ==26363== by 0x83B44DC: Cint::GExceptionWrapper(int (*)(Gvalue, char const, Gparam*, int), Gvalue, char, Gparam*, int) (Api.cxx:393) ==26363== by 0x82BCF26: Gexecute_call (newlink.cxx:2408) ==26363== by 0x82BD2E4: G__call_cppfunc (newlink.cxx:2612)

...and...

==26363== 1,513,008 bytes in 21,014 blocks are definitely lost in loss record 278,281 of 278,315 ==26363== at 0x76AD203: operator new(unsigned long) (vg_replace_malloc.c:334) ==26363== by 0x7E01968: TStorage::ObjectAlloc(unsigned long) (TStorage.cxx:325) ==26363== by 0x297D8B81: operator new (TObject.h:156) ==26363== by 0x297D8B81: StFstScanRadiusClusterAlgo::doClustering(StFstCollection const&, StFstRawHitCollection&, StFstClusterCollection&) (StFstScanRadiusClusterAlgo.cxx:140) ==26363== by 0x297D840F: StFstIClusterAlgo::doClustering(StFstCollection&) (StFstIClusterAlgo.cxx:25) ==26363== by 0x297D7AF3: StFstClusterMaker::Make() (StFstClusterMaker.cxx:59) ==26363== by 0x14F1309E: StMaker::Make() (StMaker.cxx:937) ==26363== by 0x14F1309E: StMaker::Make() (StMaker.cxx:937) ==26363== by 0x14F0E4E7: StChain::EventLoop(int, int, StMaker) (StChain.cxx:206) ==26363== by 0x14F28988: GStChain_Cint_552_0_9(Gvalue, char const, G__param, int) (in /afs/rhic.bnl.gov/star/packages/release32/gitdev/.sl73_x8664_gcc485/OBJ/StRoot/StChain/StChain.so) ==26363== by 0x83B44DC: Cint::GExceptionWrapper(int (*)(Gvalue, char const, Gparam*, int), Gvalue, char, Gparam*, int) (Api.cxx:393) ==26363== by 0x82BCF26: Gexecute_call (newlink.cxx:2408) ==26363== by 0x82BD2E4: G__call_cppfunc (newlink.cxx:2612)

...and...

==26363== 724,864 bytes in 12,944 blocks are possibly lost in loss record 278,256 of 278,315 ==26363== at 0x76AD203: operator new(unsigned long) (vg_replace_malloc.c:334) ==26363== by 0x7E01968: TStorage::ObjectAlloc(unsigned long) (TStorage.cxx:325) ==26363== by 0x297D875C: operator new (TObject.h:156) ==26363== by 0x297D875C: StFstScanRadiusClusterAlgo::doClustering(StFstCollection const&, StFstRawHitCollection&, StFstClusterCollection&) (StFstScanRadiusClusterAlgo.cxx:55) ==26363== by 0x297D840F: StFstIClusterAlgo::doClustering(StFstCollection&) (StFstIClusterAlgo.cxx:25) ==26363== by 0x297D7AF3: StFstClusterMaker::Make() (StFstClusterMaker.cxx:59) ==26363== by 0x14F1309E: StMaker::Make() (StMaker.cxx:937) ==26363== by 0x14F1309E: StMaker::Make() (StMaker.cxx:937) ==26363== by 0x14F0E4E7: StChain::EventLoop(int, int, StMaker) (StChain.cxx:206) ==26363== by 0x14F28988: GStChain_Cint_552_0_9(Gvalue, char const, G__param, int) (in /afs/rhic.bnl.gov/star/packages/release32/gitdev/.sl73_x8664_gcc485/OBJ/StRoot/StChain/StChain.so) ==26363== by 0x83B44DC: Cint::GExceptionWrapper(int (*)(Gvalue, char const, Gparam*, int), Gvalue, char, Gparam*, int) (Api.cxx:393) ==26363== by 0x82BCF26: Gexecute_call (newlink.cxx:2408) ==26363== by 0x82BD2E4: G__call_cppfunc (newlink.cxx:2612)

...and...

==26363== 615,384 bytes in 8,547 blocks are possibly lost in loss record 278,249 of 278,315 ==26363== at 0x76AD203: operator new(unsigned long) (vg_replace_malloc.c:334) ==26363== by 0x7E01968: TStorage::ObjectAlloc(unsigned long) (TStorage.cxx:325) ==26363== by 0x297D8B81: operator new (TObject.h:156) ==26363== by 0x297D8B81: StFstScanRadiusClusterAlgo::doClustering(StFstCollection const&, StFstRawHitCollection&, StFstClusterCollection&) (StFstScanRadiusClusterAlgo.cxx:140) ==26363== by 0x297D840F: StFstIClusterAlgo::doClustering(StFstCollection&) (StFstIClusterAlgo.cxx:25) ==26363== by 0x297D7AF3: StFstClusterMaker::Make() (StFstClusterMaker.cxx:59) ==26363== by 0x14F1309E: StMaker::Make() (StMaker.cxx:937) ==26363== by 0x14F1309E: StMaker::Make() (StMaker.cxx:937) ==26363== by 0x14F0E4E7: StChain::EventLoop(int, int, StMaker) (StChain.cxx:206) ==26363== by 0x14F28988: GStChain_Cint_552_0_9(Gvalue, char const, G__param, int) (in /afs/rhic.bnl.gov/star/packages/release32/gitdev/.sl73_x8664_gcc485/OBJ/StRoot/StChain/StChain.so) ==26363== by 0x83B44DC: Cint::GExceptionWrapper(int (*)(Gvalue, char const, Gparam*, int), Gvalue, char, Gparam*, int) (Api.cxx:393) ==26363== by 0x82BCF26: Gexecute_call (newlink.cxx:2408) ==26363== by 0x82BD2E4: G__call_cppfunc (newlink.cxx:2612)

jdbrice commented 7 months ago

Hi @genevb Thanks for all of your work on this and on narrowing down this issue. We will discuss this in the FWD software meeting today and try to find a solution asap

genevb commented 7 months ago

If it's of any help, I placed my log file from the 500 event test I ran yesterday here on SDCC: ~genevb/public/log_valgrind_FWD_noTpc

techuan-huang commented 7 months ago

Hi @genevb and @jdbrice, I made a pull request to fix this memory leak. You can find my log file that running 500 events with valgrind after this fix on SDCC: ~tchuang/pwg/FST/MemLeak_ClusterMaker/after/log

plexoos commented 7 months ago

resolved by #593