grand-mother / grand

Core package for GRAND aka GrandLib
https://grand-mother.github.io/grand-docs
GNU Lesser General Public License v3.0
4 stars 13 forks source link

Memory leaks in root_trees.py #71

Open ifleg opened 10 months ago

ifleg commented 10 months ago

It seems that root_trees.py have some memory leaks. When reading a large number of files in a loop, memory usage increases constantly and the job get killed because out of memory.

Ex: The simple code below illustrate it :

from grand.dataio.root_trees import *
from pathlib import Path
import psutil
id=0
p = psutil.Process()
print(p.memory_info())

pathlist = Path("/sps/grand/data/gp13/jul2023").rglob('*.root')
for pathdir in pathlist:
    path=str(pathdir)
    print(str(id)+" "+path)
    tadccounts = TADC(path)
    list_of_events = tadccounts.get_list_of_events()
    print(len(list_of_events))
    for event, run in list_of_events:
        pass
    trawvoltage = TRawVoltage(path)
    list_of_events = trawvoltage.get_list_of_events()
    for event, run in list_of_events:
        pass
    id=id+1
    print(p.memory_info())

produces the following output :

pmem(rss=389042176, vms=6187302912, shared=183660544, text=2457600, lib=0, data=5477257216, dirty=0)
0 /sps/grand/data/gp13/jul2023/t3_trigger/ROOTfiles/GRAND.TEST-CH1-AND-CH2-TRIGGER-ChanXYZ-20dB-11dus.20230727082822.100.11_dat.root
341
pmem(rss=463097856, vms=6377504768, shared=211886080, text=2457600, lib=0, data=5656481792, dirty=0)
1 /sps/grand/data/gp13/jul2023/t3_trigger/ROOTfiles/GRAND.TEST-CH1-AND-CH2-TRIGGER-ChanXYZ-20dB-11dus.20230727170958.100.1_dat.root
0
pmem(rss=463233024, vms=6377504768, shared=211886080, text=2457600, lib=0, data=5656481792, dirty=0)
2 /sps/grand/data/gp13/jul2023/t3_trigger/ROOTfiles/GRAND.TEST-CH1-AND-CH2-TRIGGER-ChanXYZ-20dB-11dus.20230727030733.100.5_dat.root
341
pmem(rss=475344896, vms=6519545856, shared=211890176, text=2457600, lib=0, data=5798522880, dirty=0)
3 /sps/grand/data/gp13/jul2023/t3_trigger/ROOTfiles/GRAND.TEST-CH1-AND-CH2-TRIGGER-ChanXYZ-20dB-11dus.20230727035812.100.6_dat.root
329
....
399 /sps/grand/data/gp13/jul2023/20dB/ROOTfiles/GRAND.TEST-RAW-10s-ChanXYZ-20dB-11dus.20230708174423.041_dat.root
3364
pmem(rss=2953019392, vms=71568863232, shared=211902464, text=2457600, lib=0, data=70847840256, dirty=0)
400 /sps/grand/data/gp13/jul2023/20dB/ROOTfiles/GRAND.TEST-RAW-10s-ChanXYZ-20dB-11dus.20230702023233.028_dat.root
3364
pmem(rss=2958274560, vms=71741800448, shared=211902464, text=2457600, lib=0, data=71020777472, dirty=0)

showing that the vms: aka “Virtual Memory Size”, this is the total amount of virtual memory used by the process grows constantly (from 6187302912 to 71741800448 after reading 400 files) and the data: aka DRS (data resident set) the amount of physical memory devoted to other than executable code. It matches “top“‘s DATA column) grows from 5477257216 to 71020777472 (so x10 in both cases). When running this code @ccin2p3 it get killed after a while with error : slurmstepd: error: Detected 1 oom-kill event(s) in StepId=54442604.batch. Some of your processes may have been killed by the cgroup out-of-memory handler.

lwpiotr commented 9 months ago

In at least half, not really a memory leak. It just loads all the trees to memory. Making them exist only in the loop context would be pretty tricky. Please call a new function stop_using() on each tree instance at the end of the loop.

However, there is indeed a memory leak somewhere. It seems that some lists are not dereferenced, but at the moment I can't find which lists and where would they be created. So please inform me if stop_using() is enough, or I need to trace the memory leak as high priority.