Open dalbabur opened 4 months ago
What version are you running? Can this be reproduced on a desktop machine?
I'm using libroadrunner 2.5.0
haven't tried running it locally yet
I can say that the saveState/loadState functions had a bug that was fixed with 2.7.0. It was causing crashes, not memory leaks, though. But it can't hurt to try the latest version, at least?
sure ill update and report back
same thing with 2.7.0...
We'll probably need a desktop example that shows the effect in order to pin down the leak.
Thanks for checking!
I just ran all of roadrunner's C-based tests through valgrind and there were no errors/leaks there, so the problem must lie either in Python directly or in the Python bindings. If you could manage to get something that illustrated the problem and could be run locally, that would be ideal.
Thanks for checking that Lucian. Working on a minimal example that will show the issue locally...
What are some way I could check for memory leaks in python or bindings?
It's possible to run valgrind on python, but that's going to find issues on even the blandest of scripts. It should also find the leak we're looking for, though. The main thing I can think of is to just have the exact same script as ran on Hyak, but locally (and maybe simpler) and watch it eat memory?
I've also had this/a similar problem when simulating many times in parallel. My model is of a population of cells, so each model simulation contains ~500 simulations of individual cells. For parameter optimization this is then simulated again 10.000x. I have this issue on my local machine but also on a cluster (where I first noticed the problem, because it used up all the memory, ~90 GB). libroadrunner version is 2.7.0.
The only way that I found to prevent this memory leak, was to use joblib to dump the loaded model and only load it within each child process (short of reloading it every time, which just takes way to long). It's been some time since I looked into it, but I tested many different ways of either resetting/clearing the model, simulation settings and also different multiprocessing setups, but none worked. The most it did was, that the memory went down after one process was done, but went up immediately when it started the next simulation.
I made two minimal examples, one with and one without using joblib. I examined and plotted the memory usage with Memory Profiler. libroadrunner version is 2.7.0.
mprof run --multiprocess --include-children -o "./mprofile_$(date +"%F-%H%M").dat" ./RunMinMemTest.py
mprof plot -o "./MemPlot_Standard_$(date +"%F-%H%M").png"
Standard Parallel Simulation
Simulation where loaded model is dumped and loaded in child process
Code:
```python from memory_profiler import profile from roadrunner.tests import TestModelFactory as tmf from joblib import dump, load import time import tellurium as te from concurrent.futures import ProcessPoolExecutor, as_completed def SimModel(m): m.resetAll() start_t = 0 end_t = 250 steps = 250*10 result = m.simulate(start_t, end_t, steps) return def pSim(r): # Population Simulation - Many individual simulations nSims = 480 executor = ProcessPoolExecutor() Results =[] futures = (executor.submit(SimModel, r) for n in range(nSims)) for future in as_completed(futures): Results.append(future.result()) future = [] Results = [] return @profile def run_sim(r): # Represents Optimzation with many simulations of the model for i in range(10): pSim(r) def main(): sbml = tmf.Brown2004().str() r = te.loadSBMLModel(sbml) t1 = time.perf_counter() run_sim(r) elapsed_time = time.perf_counter() - t1 print('Time:', elapsed_time, 'sec') if __name__ == '__main__': main() ```
```python from memory_profiler import profile from roadrunner.tests import TestModelFactory as tmf from joblib import dump, load import time import tellurium as te from concurrent.futures import ProcessPoolExecutor, as_completed def SimModel(r_loc): m = load(r_loc) start_t = 0 end_t = 250 steps = 250*10 result = m.simulate(start_t, end_t, steps) return def pSim(r_loc): # Population Simulation - Many individual simulations nSims = 480 executor = ProcessPoolExecutor() Results =[] futures = (executor.submit(SimModel, r_loc) for n in range(nSims)) for future in as_completed(futures): Results.append(future.result()) future = [] Results = [] return @profile def run_sim(r_loc): # Represents Optimzation with many simulations of the model for i in range(10): pSim(r_loc) def main(): sbml = tmf.Brown2004().str() r = te.loadSBMLModel(sbml) r_loc = 'rrmodel_test.joblib' dump(r, r_loc) r = [] t1 = time.perf_counter() run_sim(r_loc) elapsed_time = time.perf_counter() - t1 print('Time:', elapsed_time, 'sec') if __name__ == '__main__': main() ```
I'm running simulations in parallel on Hyak using ipyparallel. I'm able to load and simulate models on many engines, but eventually I run out of memory. After doing a couple of tests, I believe the memory leak is related to roadrunner and not ipyparallel.
Here is what I'm seeing: initial load: and the load eventually after some iterations:
I'm doing something like this in a loop with different parameter sets:
And i have these config flags: