key4hep / k4FWCore

Core Components for the Gaudi-based Key4hep Framework
Apache License 2.0
10 stars 26 forks source link

Memory leak in IOSvc? #249

Open giovannimarchiori opened 1 week ago

giovannimarchiori commented 1 week ago

Dear experts,

after migrating my code from k4DataSvc to IOSvc, my reconstruction jobs take a big amount of memory - so big that if I try to run over many events, or to run (as I used to do without problems with k4DataSvc) many jobs in parallel on my 96-core machine, the memory of the system (512 GB of ram) gets exhausted and I start getting many OS24 errors (too many open files).

I have checked that with a very simple steering script that only sets up the reading of a root file produced with ddsim and writes it to a new file, without running other algorithms, a job using IOSvc can take 20 GB of RAM (as observed checking the output of free -h) while for k4DataSvc the free RAM stays stable during the job.

I put my input file and two scripts, using either IOSvc or k4DataSvc, on lxplus

Could you please have a look?

Thanks a lot, Giovanni

Tagging @BrieucF @jmcarcell @tmadlener

BrieucF commented 1 week ago

I can indeed reproduce the problem on Alma9 machines.

So I tried to regenerate a SIM file with the 2024-10-09 nightlies and the behavior for this new file looks as expected. Maybe that could be the problem?

What I find weird is is that the podio and edm4hep versions used to generate ~gmarchio/public/iosvc/ALLEGRO_sim.root are the same that are shipped with 2024-10-09.

giovannimarchiori commented 1 week ago

How many events are there in your new file? Mine had 2000 and was pretty big - not sure if that matters

tmadlener commented 1 week ago

Thanks for the report. I will have a look to figure out whether it's an issue with podio or whether this is something in the IOSvc.

BrieucF commented 1 week ago

How many events are there in your new file? Mine had 2000 and was pretty big - not sure if that matters

I generated 1000 events to have something similar to what you had. If useful, it is here: /afs/cern.ch/user/b/brfranco/work/public/giovanni_leak/ddsim_output_edm4hep.root.

jmcarcell commented 1 week ago

There is a leak in the Writer (when running without writing to an output file I don't see that much memory usage) that #250 fixes for that case, but I'm not sure yet if that's complete.