key4hep / k4LCIOReader

Generate EDM4hep collections from LCIO format data
Apache License 2.0
2 stars 11 forks source link

Fix a memory leak in LCIO -> EDM4hep conversion #28

Closed jmcarcell closed 1 year ago

jmcarcell commented 1 year ago

BEGINRELEASENOTES

ENDRELEASENOTES

Fixes https://github.com/key4hep/k4MarlinWrapper/issues/96. The issue is that there are some collections that are created but never deleted so the fix is to make sure to delete them. When running with a file with events with tracks and clusters I don't get any memory leaks with this fix (and I do get them without the fix). To reproduce, build k4LCIOReader and then build k4MarlinWrapper (needed because the signature of one function changed).

Edit: After some discussion in the key4hep call, it has been pointed out that these collections may not be registered correctly

tmadlener commented 1 year ago

From what I understand currently, the collections that are converted but never put into the Gaudi TES are leaked. I am not sure there is an easy fix for this, because ownership will probably be a proper mess. The solution here would mean that there are collections that we convert multiple times, because we no longer cache them. However, at the converter level there is no easy way of keeping track of which collections have been added to the TES and which were not, I think.

Having said that, we are working on a "standalone" conversion from LCIO to EDM4hep that should hopefully address these concerns, and also have an API that is more similar to the one found in k4EDM4hep2LCIOConv. We should have this ready shortly and are aiming to upstream that to k4EDM4hep2LCIOConv.

Could you check if those potential double conversions actually have a measurable impact on the runtime in the end?

zoujh commented 1 year ago

In my original thought, the LCIOInput algorithm is always loaded to register the collections to TES in the context of Gaudi. And then the LCIOInput.collections property must be set properly. The memory leak happens when some collections are set in LCIOInput.collections without their dependencies. In my opinion, a better solution is to register the dependencies to TES in LCIOInput automatically.

jmcarcell commented 1 year ago

Some comments first to @tmadlener:

From what I understand currently, the collections that are converted but never put into the Gaudi TES are leaked. Yes.

The solution here would mean that there are collections that we convert multiple times, because we no longer cache them.

While I haven't looked at if this is happening or not I think in most cases this is not happening. What is happening now is that we have a list of collections to go through, get them and register them in Lcio2EDMhep.cpp. During this process the collections that depend on those are also transformed but in the end only the 'main' one is returned and registered, see https://github.com/key4hep/k4LCIOReader/blob/63e308b2fc4c9a6d9ada1d56086835225baa8f90/k4LCIOReader/src/k4LCIOConverter.cc#L206-L209 for an example of how in a sort of recursive way the transformation of other collections is done. So I think this only happens once for each dependent collection most of the time unless there are lots of inter-dependencies between collections. Timing results are not conclusive but I think they show that there isn't a huge number of conversions (and I vaguely remember putting some prints inside the transformation functions and the ones I checked were called only once per event).

Some timing results, master of k4LCIOReader:

k4run main.py > /dev/null 2>&1  28.20s user 1.16s system 99% cpu 29.449 total

Branch in this PR:

k4run main.py > /dev/null 2>&1  26.81s user 0.78s system 99% cpu 27.693 total

(of course the timing tests are done in the same machine, with the same python script and the same file with 10000 events).

Comments for @zoujh: I'm not sure I fully understand. Are you referring to the LCIOInput.h file? I assume it will have the same problem since it's also using the k4LCIOReader but the problem is in the k4LCIOReader right? But I don't think we are using it from k4MarlinWrapper since the k4LCIOReader is being called directly. Are you saying that we should change the list of collections, that starts here https://github.com/key4hep/k4MarlinWrapper/blob/9ef24c4d570899cb0b566aebbbb92fc35442685a/k4MarlinWrapper/src/components/Lcio2EDM4hep.cpp#L163 ? (similarly to how it is done in LCIOInput

tmadlener commented 1 year ago

Thanks for checking again. I don't see a really easy solution here, unfortunately. The one thing that could be a sort of stop gap to this is to introduce a vector<string> into the k4LCIOReader to keep track of the collections that have actually been handed off to the TES, and then pass that list back to the converter e.g. in k4LCIOReader::endOfEvent to delete all the podio::CollectionBases the m_name2dest map that are not in this list.