star-bnl / star-sw

Core software for STAR experiment
26 stars 63 forks source link

Fix memory leak in StPicoDstMaker reported in #21 #20

Closed plexoos closed 2 years ago

plexoos commented 2 years ago

The code to "recover" StEmcCollection was added in bb715e65f0da0ad6ec9bef78cb791546d0bcbe57

It seems to have introduced a memory leak which resulted in slower production jobs as reported in #21. While the investigation is still underway, there is a need to produce a new release with other changes included since SL21b.

genevb commented 2 years ago

Hi.

I looked at this file on github and I still don't see where mEmcCollection is being deleted. THAT is the primary memory leak of concern here. The instantiation of a new StMuEmcUtil on every event probably was a memory leak that should be corrected, but a very small one that wasn't impacting the jobs. You have to delete mEmcCollection, and I think at the end of StPicoDstMaker::MakeWrite() is a good place.

-Gene

nigmatkulov commented 2 years ago

Hi @genevb and @plexoos,

Adding delete mEmcCollection; does help but only for daq->picoDst (via BFC). In case of MuDst->picoDst conversion it causes segfault: .sl73_gcc485/obj/StRoot/St_base/StArray.cxx:233: void StStrArray::clear(): Assertion `!fV.size() || !fV[0] || ((TObject*)fV[0])->TestBit(TObject::kNotDeleted)' failed.

Without adding delete mEmcCollection; when I run the code there not memory leak message while one really expects it. If you have any idea about how to fix it please let me know.

Cheers, Grigory

plexoos commented 2 years ago

I checked with the original command as reported in issue #21 and the cpu times per event still indicate there is a memory leak with the currently proposed changes cpu_timing

plexoos commented 2 years ago

Please provide an example for MuDst->picoDst conversion that causes segfault

nigmatkulov commented 2 years ago

root4star .L genDst.C // This macro can be checked-out from StRoot/macros/mudst/genDst.C genDst(1e9,"picoDst,PicoVtxMode:PicoVtxDefault,TpcVpdVzDiffCut:6,PicoCovMtxMode:PicoCovMtxWrite,PicoBEmcSmdMode:PicoBEmcSmdWrite","/star/u/gnigmat/soft/tmp/st_physics_20087007_raw_2000001.MuDst.root")

genevb commented 2 years ago

Hi, Grigory

On Jun 14, 2021, at 11:35 AM, nigmatkulov @.**@.>> wrote:

Adding delete mEmcCollection; does help but only for daq->picoDst (via BFC). In case of MuDst->picoDst conversion it causes segfault: .sl73_gcc485/obj/StRoot/St_base/StArray.cxx:233: void StStrArray::clear(): Assertion `!fV.size() || !fV[0] || ((TObject*)fV[0])->TestBit(TObject::kNotDeleted)' failed.

Without adding delete mEmcCollection; when I run the code there not memory leak message while one really expects it. If you have any idea about how to fix it please let me know.

It's quite possible I'm wrong, but ... I'm going to guess that when doing the MuDst->PicoDst conversion, this line delivers a valid pointer:

mEmcCollection = mMuDst->emcCollection();

...and you get an EMC collection that belongs to the MuDst... and you shouldn't delete it. If you do, there will be a seg fault. However, when not doing MuDst->PicoDst conversion (i.e. DAQ->PicoDst), then that line gets a null pointer, so this line assigns the EMC collection:

mEmcCollection = emcUtil.getEmc( mMuDst->muEmcCollection() );

But with this second line, the EMC collection is created each event, but nobody owns the EMC collection, so you have to be the one to delete it each event.

-Gene

plexoos commented 2 years ago

I confirm that simply adding delete mEmcCollection; at the end of the function causes the segfault with the provided MuDst->picoDst conversion example. I am not familiar with the code to know if there is a way to distinguish between MuDst->picoDst and DAQ->PicoDst cases

genevb commented 2 years ago

Just a follow up...Grigory and I discussed this in a voice chat. A couple points: 1) He was relying on ROOT dictionary of objects that inherit from TNamed to indicate whether an object was being created multiple times. But I have explained that StObject-derived classes do no inherit from TNamed, so there is no helpful message from the dictionary that an object of the same name is getting re-created. 2) We discussed a way to tell whether the MuDst owns the EmcCollection, using that as a decision whether to delete the collection or not. He will try this. -Gene

nigmatkulov commented 2 years ago

So the proposed (by Gene) solution seems to work. It would be good if @genevb or @plexoos could confirm the absence of the memory leak.

To run daq->picoDst I used: root4star -b -q -l 'bfc.C(50,"DbV20210417 P2019a StiCA -beamline3D btof BEmcChkStat CorrY -OPr13 evout PicoVtxVpd PicoCovMtxWrite -hitfilt","/star/data03/daq/2019/087/20087007/st_physics_20087007_raw_2000001.daq")'

The output can be found here: /star/u/gnigmat/soft/tmp/st_physics_20087007_raw_2000001.MuDst.root To convert the produced MuDst file to picoDst I used standard genDst in the following way: root4star .L genDst.C genDst(1e9,"picoDst,PicoVtxMode:PicoVtxDefault,TpcVpdVzDiffCut:6,PicoCovMtxMode:PicoCovMtxWrite,PicoBEmcSmdMode:PicoBEmcSmdWrite","/star/u/gnigmat/soft/tmp/st_physics_20087007_raw_2000001.MuDst.root")

Please let me know if this solution works.

plexoos commented 2 years ago

The new result is purple cpu_timing

genevb commented 2 years ago

I tested both suggested processes and found no signs of jobs slowing down (I tried the MuDst->picoDst conversion with 2000 events). I think you have a solution, though I'm not sure why you went with a static StMuEmcUtil instance. I believe that being static means that it will be instantiated whether or not it is actually ever used, but you actually only need it to be instantiated when reading DAQ files.

plexoos commented 2 years ago

The constructor for static StMuEmcUtil emcUtil; will be called once when the program flow goes inside the if ( !mEmcCollection ) { statement

genevb commented 2 years ago

@plexoos , the documentation I've found indicates that initialization of static objects occurs "before the program starts", not at levels of scope. If you have documentation that says otherwise, please point me to it. Thanks.

Example: https://www.ibm.com/docs/en/i/7.1?topic=classes-initialization-static-variables "A static variable in a block is initialized only one time, prior to program execution"

This is why I suggested to use a static pointer, not a static instance of the StMuEmcUtil class.

plexoos commented 2 years ago

Your link is probably talking about static variables inside classes. Here is simple example to illustrate my point:

$ cat teststatic.cc
#include <iostream>
#include <string>

struct MyStruct {
  std::string str;
  MyStruct(const std::string s) : str(s) { std::cout << "Construct MyStruct: " << str << '\n'; }
  ~MyStruct() { std::cout << "Destroy MyStruct: " << str << '\n'; }
};

void foo(bool create_obj) {
  if (create_obj)
    static MyStruct obj("inside if");
}

int main(int argc, char* argv[]) {
  if (argc < 2)
    foo(false);
  else
    foo(true);
}

$ g++ teststatic.cc
$ ./a.out
$ ./a.out blah
Construct MyStruct: inside if
Destroy MyStruct: inside if

EDIT: This documentation seems more relevant: https://en.cppreference.com/w/cpp/language/storage_duration#Static_local_variables

Variables declared at block scope with the specifier static have static storage duration but are initialized the first time control passes through their declaration

plexoos commented 2 years ago

Sorry for the last minute comment (feel free to ignore) but maybe Grigory can quickly enlighten me about the logic around this code. My thought is why not to check if the MuDst object has a StMuEmcCollection rather than checking for a StEmcCollection first. Couldn't then the latter be created from the former equally and regardless of whether the processing is DAQ->Pico or MuDST->Pico?

genevb commented 2 years ago

Seems to be a pretty convincing test. If I read the documentation you pointed to correctly, then initialization could potentially be done before that scope (before program execution) if it is to 0 or a constant. But in the case of your example, and in the case of StMuEmcUtil, the constructor has functional code, so that wouldn't apply, and indeed the initialization occurs at scope entry. OK, thanks for the helpful information, Dmitri. I'll approve the code now.

nigmatkulov commented 2 years ago

Sorry for the last minute comment (feel free to ignore) but maybe Grigory can quickly enlighten me about the logic around this code. My thought is why not to check if the MuDst object has a StMuEmcCollection rather than checking for a StEmcCollection first. Couldn't then the latter be created from the former equally and regardless of whether the processing is DAQ->Pico or MuDST->Pico?

I guess you mean whether the BEMC information can be filled from MuEmcCollection. Is that what you are asking? If so, then the answer will likely be "no possible" because EmcCollection is needed also for some other parts that MuEmcCollection does not have.

genevb commented 2 years ago

Impact on the slowness of nightly tests looks good. Here is a plot of log10(CPU time) for a sample of 6 different nightly tests (colors) for each of the past 7 days (horizontal axis):

NightlyTestSample_CPU_impactMemLeakFix