Open inducer opened 1 year ago
Is the growth reflected in
I.e. do those increase timestep-over-timestep?
If there is growth, can identify what bins in the memory pool are affected? Can you identify which allocations? Python makes it straightforward to attach stack traces to allocations.
Do we know if this growth is of "array" memory or "other" memory?
How do your findings change if you call free_held
between steps?
What is the simplest driver that exhibits the growth? I gather from @lukeolson that, maybe, examples/wave-lazy.py
may be affected. Could you please confirm? Is grudge/examples/wave/wave-min-mpi.py
affected as well? Is, say, vortex-mpi.py
affected? Grudge's Euler?
Also, it looks like set_trace
is exposed so you could get some additional information from that:
https://github.com/inducer/pyopencl/blob/main/src/mempool.hpp#L164
including bin size data
- that memory growth only occurs when using the memory pool
The growth happens both with and without the pool. Here is an example with drivers_y2-prediction/smoke_test_ks
(lazy-eval), 1 rank, Lassen CPU (y-axes are in "MByte"):
both seem to "level off" after ~140 steps, but memory is likely to grow in the future. see e.g. this graph for a different Lassen run (SVM pool):
Btw, please keep vertical space in mind when writing issue text. Write claims, and hide supporting evidence under a <details>
. I've done that for your comment above.
Tracing the memory pool allocations with set_trace
(and using https://github.com/illinois-ceesd/mirgecom/pull/840) with the same config as before (1 rank, smoke_test_ks
, CPU) revealed some interesting information:
[pool] allocation of size 1511472 required new memory
I gather that you are using some (unspecified?) system/process-level metric of memory usage.
The memory usage I initially added here is the RSS high water mark measured with https://github.com/illinois-ceesd/logpyle/pull/79 (= max_rss
).
Thanks. This tally of pool-held memory means (to me) that the issue is very likely "above" the pool, i.e. in Python. I.e., replacing the memory allocation scheme used by the pool should not help, or at least not much.
My read of this is that some member of a group of objects that cyclically refer to each other holds a reference to our arrays. This follows because Python's refcounting frees objects without cyclic referents effectively instantaneously, i.e. as soon as a reference to them is no longer being held.
To validate the latter conclusion, you could try calling gc.collect()
every $N$ time steps to see if that helps free those objects. (Of course, that won't do much if there is some cyclic behavior in what references are held.)
Assuming the above conclusion is correct, the way to address this would be to find the objects referring to the arrays and make it so they no longer hold those references.
gc
module: https://docs.python.org/3/library/gc.html#gc.get_referrersWhat is the simplest driver that exhibits the growth? I gather from @lukeolson that, maybe,
examples/wave-lazy.py
may be affected. Could you please confirm? Isgrudge/examples/wave/wave-min-mpi.py
affected as well? Is, say,vortex-mpi.py
affected? Grudge's Euler?
I've seen the growth in all drivers I tried, including the simplest ones:
wave
, wave-mpi
euler/vortex
, wave/wave-op-mpi
The growth only happens in lazy mode, not eager. The specific memory pool used (SVM, CL buffer) or lazy actx class do not seem to matter.
Graph for mirgecom's wave:
To validate the latter conclusion, you could try calling
gc.collect()
every N time steps to see if that helps free those objects. (Of course, that won't do much if there is some cyclic behavior in what references are held.)
It does seem that running gc.collect
resolves mitigates this issue for us. The following results are for smoke_test_ks
, but it is similar for the simpler testcases.
It's important that gc.collect
is not a solution, but a workaround. It's quite expensive (and should be unnecessary), and it only masks the problem.
It's important that
gc.collect
is not a solution, but a workaround. It's quite expensive (and should be unnecessary), and it only masks the problem. 👍
I like your idea of running it every $N$ steps, though. This workaround can likely keep us running comfortably in the interim. afaict, after injecting this fix into the prediction driver, the code infrastructure is now capable of production-scale prediction-like runs, and at the very least in good shape for February trials (leaps and bounds over last year). Gigantic cool.
A few more updates for mirgecom's wave (w/ lazy eval):
gc.garbage
is empty (which is expected I think).gc.set_debug(gc.DEBUG_SAVEALL)
, gc.garbage
contains ~62000 objects after the first time step. Each subsequent time step adds about ~1000 objects. Is my assumption correct that those objects are the ones we suspect of having circular references (+holding a reference to arrays)? I was adapting this code https://code.activestate.com/recipes/523004-find-cyclical-references/ to check if there are array references in the objects with circular references, but this appears to be extremely time consuming.Edit:
I understand that there is some type of memory growth occurring.
From the 2023-02-17 dev meeting notes, I gather that
Possibly related: #212.
cc @matthiasdiener