star-bnl / star-sw

Core software for STAR experiment
26 stars 63 forks source link

Mu fcs hit clear #301

Closed jdbrice closed 2 years ago

jdbrice commented 2 years ago

Attempt to modify TClonesArray cleanup to prevent memory leak on StMuFcsHit

genevb commented 2 years ago

Test job to see the memory leak (was showing 273,940 bytes lost in 5 events from StMuFcsHit - biggest and last item in the valgrind report):

valgrind --leak-check=full root4star -b -q -l bfc.C\(5,\"pp2022,StiCA,BEmcChkStat,epdhit,\-hitfilt,\-fcsCluster,\-fcsPoint\",\"/star/data03/daq/2022/027/23027048/st_physics_23027048_raw_1000002.daq\"\)

From what I see, this patch does resolve the memory leak issue. I do not know whether the "C" argument in Clear() affects other collections.

-Gene

genevb commented 2 years ago

valgrind also reported another similar, but smaller memory leak. It appears that StMuEvent is also stored in a collection (of size 1?), and is never deleted on each event. However, it has a TArrayI data member whose contents are set in this line:

mL2Result.Set(event->triggerData()->l2ResultLength(),(const Int_t*) event->triggerData()->l2Result());

This form of TArrayI::Set() instantiates an array on the heap that is typically deleted in the destructor: Set() : https://root.cern/root/html534/src/TArrayI.cxx.html#oaNEsD Destructor : https://root.cern/root/html534/TArrayI.html#TArrayI:_TArrayI

I guess you could create a Clear() function for the StMuEventclass as well to delete the array:

mL2Result.Set(0);

I've been using a run for my tests that doesn't have many detectors in it, so I may be missing other collections with the same issue. I'll try processing a typical production run and see what it shows.

-Gene

genevb commented 2 years ago

Just looking for other TArray data members of MuDst classes, I see there's one called mHits in StMuEmcCluster, but someone already wrote a Clear() function that calls Set(0) :-)

void StMuEmcCluster::Clear(Option_t*)
{
  mEta=0;mPhi=0;mSigmaEta=0;mSigmaPhi=0;mEnergy=0;mNHits=0;
  mHits.Set(0);
}

-Gene

jdbrice commented 2 years ago

Just looking for other TArray data members of MuDst classes, I see there's one called mHits in StMuEmcCluster, but someone already wrote a Clear() function that calls Set(0) :-)

void StMuEmcCluster::Clear(Option_t*)
{
  mEta=0;mPhi=0;mSigmaEta=0;mSigmaPhi=0;mEnergy=0;mNHits=0;
  mHits.Set(0);
}

-Gene

This is exactly why I think adding the "C" option should be added - I always thought clearing the contained objects was the default. So the code you show here was maybe never even called since we did not use "C" option previously. I am still looking into the way ROOT handles all of this, maybe there is a more elegant solution where the dtors are called, if so I'll investigate that.

genevb commented 2 years ago

Other leaks when not calling destructors found from running valgrind on data with other detectors:

valgrind --leak-check=full root4star -b -q -l bfc.C\(4,\"pp2022,StiCA,BEmcChkStat,epdhit,\-hitfilt,\-fcsCluster,\-fcsPoint\",\"/star/data03/daq/2022/046/23046005/st_physics_adc_23046005_raw_7500007.daq\"\)

StMutEvent has an StEventSummary data member, which itself has several TArray* members. By not calling desturctors, the arrays that are stored on the heap are never deleted. Because the classes are data members, not pointers to the classes, you can't delete the StEventSummary instance inside a StMuEvent::Clear() function. You would also need to create a StEventSummary::Clear() in which all of its TArray* data members' used Set(0).

Next up is StMuETofHeader, which uses STL vector instances that can also put things on the heap, and so must be deleted. Again, they are data members, so someone may have to create a StMuETofHeader::Clear() function that that calls vector::clear() for the vector data members.

-Gene

jdbrice commented 2 years ago

Other leaks when not calling destructors found from running valgrind on data with other detectors:

valgrind --leak-check=full root4star -b -q -l bfc.C\(4,\"pp2022,StiCA,BEmcChkStat,epdhit,\-hitfilt,\-fcsCluster,\-fcsPoint\",\"/star/data03/daq/2022/046/23046005/st_physics_adc_23046005_raw_7500007.daq\"\)

StMutEvent has an StEventSummary data member, which itself has several TArray* members. By not calling desturctors, the arrays that are stored on the heap are never deleted. Because the classes are data members, not pointers to the classes, you can't delete the StEventSummary instance inside a StMuEvent::Clear() function. You would also need to create a StEventSummary::Clear() in which all of its TArray* data members' used Set(0).

Next up is StMuETofHeader, which uses STL vector instances that can also put things on the heap, and so must be deleted. Again, they are data members, so someone may have to create a StMuETofHeader::Clear() function that that calls vector::clear() for the vector data members.

-Gene

Can you send me your valgrind report file? I will work on adding the needed Clear functions.

jdbrice commented 2 years ago

I agree. However, Akio and I were unsure about this (maybe it is causing the other issue with MuDst) - i.e. can TClonesArray hold an object of unknown size?

klendathu2k commented 2 years ago

You are right. Data members in a TClonesArray need to be of fixed size. TArray may violate this.

Jason

On 2022-02-16 12:51, Daniel Brandenburg wrote:

I agree. However, Akio and I were unsure about this (maybe it is causing the other issue with MuDst) - i.e. can TClonesArray hold an object of unknown size?

-- Reply to this email directly, view it on GitHub [1], or unsubscribe [2]. Triage notifications on the go with GitHub Mobile for iOS [3] or Android [4]. You are receiving this because you commented.Message ID: @.***>

Links:

[1] https://github.com/star-bnl/star-sw/pull/301#issuecomment-1041932231 [2] https://github.com/notifications/unsubscribe-auth/ANL4LVCDJB5CBGVBQROHAOLU3PP3PANCNFSM5OQG2IHQ [3] https://urldefense.com/v3/__https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675__;!!P4SdNyxKAPE!U0mYHX0zTzBw8Tpq9fN5-GCM60N3OWqO8XCDLNwtGss5HCFGkDY_DtbUcyhbxqU$ [4] https://urldefense.com/v3/__https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign*3Dnotification-email*26utm_medium*3Demail*26utm_source*3Dgithub__;JSUlJSU!!P4SdNyxKAPE!U0mYHX0zTzBw8Tpq9fN5-GCM60N3OWqO8XCDLNwtGss5HCFGkDY_DtbU7gltsD0$

genevb commented 2 years ago

Hi,

On Feb 16, 2022, at 1:35 PM, klendathu2k @.**@.>> wrote: You are right. Data members in a TClonesArray need to be of fixed size. TArray may violate this.

I don't think TArray violates this because its data members are fixed. However, one of its members is the fArray pointer to a separate memory block for the array. Keeping a handle on that array is the ultimate concern here.

Daniel, valgrind log is here: ~genevb/public/log.valgrind2022

-Gene

jdbrice commented 2 years ago

OK commit 2a7e328a1e88cf17202b9d1917f5350a98fdacac has updates for StMuEvent (and StEventSummary) and StMuETofHeader.

I didnt run a full valgrind but I did test to make sure the ::Clear(...) was being called.

I also changed StMuFcsHit to use a TArrayS instead of a TArrayS*

akioogawa commented 2 years ago

Okey. I see that TArray does not violate TConesArray... I guess its same for TRefArray in StMuFcsCuster and this is nothing to do with StMuFcsCluster reading crash...

genevb commented 2 years ago

I didnt run a full valgrind ...

I ran valgrind and now it only shows MuDst leaks with the STL vectors in StMuETofHeader, which is surprising given that you do clear the vectors, and I confirmed that the Clear() is called, though for some reason it only appeared to be called 4 times in 5 events, so perhaps just one event gets lost?

Anyhow, this is really small and won't add up to more than a few MB over 50k events even. I can try to track it down later this evening, but I'm fine with moving forward as it is now. Thanks for the work, @jdbrice .

-Gene

genevb commented 2 years ago

I got it....

STL vectors have both a capacity (actual memory allocation) and a size (in-use portion of the allocation). While clear() sets the size to zero, capacity remains at the discretion of the STL implementation. The best bet for reducing the capacity to zero is to call clear(), and then call shrink_to_fit(), which brings brings the capacity down to fit the size (though my impression is that this isn't guaranteed, it has the highest probability of achieving the effect; there is an erase() function, but it too may or may not actually reduce the capacity). Of course deleting the vector guarantees that the allocation is freed, but that's not an option here.

Anyhow, I printed the capacities of the two vectors before and after the clear() and indeed they are unchanged. I then tried shrink_to_fit() and the capacity goes to zero! Yay! I am running in valgrind now and I am quite optimistic this memory leak will be gone.

As for STL maps, there doesn't seem to be this size vs. capacity business, and clear() does the appropriate job. And valgrind shows no leak from them.

So in the end, it appears that what we want is:

virtual void Clear( Option_t *opt = "" ){
    // ensure allocations are destroyed
    mRocGdpbTs.clear();
    mRocStarTs.clear();
    mMissMatchFlagVec.clear();
    mGoodEventFlagVec.clear();
    mMissMatchFlagVec.shrink_to_fit();
    mGoodEventFlagVec.shrink_to_fit();
}

-Gene

genevb commented 2 years ago

I am running in valgrind now and I am quite optimistic this memory leak will be gone.

Well, valgrind still reported on the two vectors in StMuETofHeader as memory leaks, but says of them:

==26042== 0 bytes in 4 blocks are definitely lost in loss record 5 of 270,908
...
==26042== 0 bytes in 4 blocks are definitely lost in loss record 6 of 270,908

I'll take memory leaks of size 0 bytes for the win any day :-)

-Gene

jdbrice commented 2 years ago

I am running in valgrind now and I am quite optimistic this memory leak will be gone.

Well, valgrind still reported on the two vectors in StMuETofHeader as memory leaks, but says of them:

==26042== 0 bytes in 4 blocks are definitely lost in loss record 5 of 270,908
...
==26042== 0 bytes in 4 blocks are definitely lost in loss record 6 of 270,908

I'll take memory leaks of size 0 bytes for the win any day :-)

-Gene

Agreed :)

plexoos commented 2 years ago

Freeing the memory and allocating again for every event is potentially a more expensive operation than reusing the same chunk of memory. This is probably a reason why we have zero calls to std::vector::shrink_to_fit() in our code.

genevb commented 2 years ago

On Feb 18, 2022, at 10:47 AM, Dmitri Smirnov @.**@.>> wrote:Freeing the memory and allocating again for every event is potentially a more expensive operation than reusing the same chunk of memory. This is probably a reason why we have zero calls to std::vector::shrink_to_fit() in our code.

Agreed, Dmitri, which is why STL is designed to do all that memory management behind the scenes. A better approach would probably be to reuse the same StMuETofHeader instances (not use "new StMuETofHeader()", but instead use a "fill" function of some sort, only doing "new" to extend the allocation within the TClonesArray if necessary), but that'll take a little more careful work to do the bookkeeping. In the existing approach to TClonesArrays in the MuDst, the simplest and cleanest way forward is to free and re-allocate.

Thanks, -Gene

jdbrice commented 2 years ago

@ullrich-bnl Can you comment or approve these changes

akioogawa commented 2 years ago

@ullrich-bnl? please?