Closed wdconinc closed 1 year ago
I think I figured out the underlying problem for this, but I am not yet entirely sure if there is an easy solution to it.
The problem is the following: PodioDataSvc
uses podio::EventStore
to read collections from file. The podio::EventStore
keeps track of all collections it has read, including the ones that are implicitly read when resolving relations (i.e. the CaloHitContributions
in this case). It also takes ownership of all collections it reads(!). Calling EventStore::clear
disposes of all read collections. The PodioDataSvc
on the other hand hands over ownership to the Gaudi
EventStore (since the DataWrapper
takes ownership of the wrapped podio::CollectionBase*
). To make the podio::EventStore
"aware" of this fact it calls EventStore::clearCaches
:
That clears the internal vector of the EventStore
, which in turn means that only explicitly read collections are now handled by the PodioDataSvc
and implicitly read ones are no longer reachable from the EventStore
and leak.
Given this interplay between the two, I am not sure there is an easy fix. Since overhauling the whole I/O is on our list once we have a prototype of the Frame in podio my preferred solution would actually be to wait until then, unless this is something that cannot be worked around by simply listing all (implicitly) read collections.
Is there a way to get notified of implicitly read collections as a WARNING?
I don't think there is an easy way to do that, since implicitly and explicitly read collections go through the same mechanism in the EventStore
(especially in the way in which it is used here in k4FWCore). I will have a second look to make sure I didn't miss anything though.
I think this should no longer apply with the introduction of Frame based I/O (#100). If this is still an issue, feel free to re-open.
We are experiencing a memory leak in the EIC framework which we can reproduce in the key4hep framework as well. We take the same approach to the PodioDataSvc, so that's not very surprising, but I'm reporting it here so someone with expertise in podio can take a look too.
Steps to reproduce
spack install k4fwcore@1.0pre14
(or e.g. build an environment with k4fwcore@1.0pre14)git clone https://github.com/key4hep/k4-project-template
, build, installcp k4ProjectTemplate/options/readExampleEventData.py k4ProjectTemplate/options/readEICEventData.py
and apply the following changes to read in the collection 'EcalBarrelHits' for all events in an existing file:ddsim
script against the EIC geometry.build
directory, run/usr/bin/time -v ./run k4run ../k4ProjectTemplate/options/readEICEventData.py
.Expected behavior
The value under
Maximum resident set size (kbytes)
should be independent of number of events (and about 500 MB)Actual behavior
The memory use increases with number of events, and goes up to ~2 GB for this file (with 1000 events). Files with more than 10k events are not able to be run on typical systems due to memory limitations.
Workaround
The memory leak can be avoided by explicitly listing the linked collections, i.e. modifying the diff above to
in which case the EcalBarrelHitsContributions collection is properly deleted after each event.
This behavior may qualify as a bug since a data model may change without notice, and downstream options file will continue to work but appear to start leaking memory. It would be clearer if it just failed instead (or didn't leak the memory).