Closed Maxwellfire closed 1 year ago
Interestingly, this does not appear to happen on my other machine, which has the same version of fipy
but a different version of petsc4py
/petsc
. The working machine is using 3.17.4 real_he17b7f0_100 conda-forge
and the failing version is using 3.18.2 real_hce8fbc3_100 conda-forge
Interestingly, this does not appear to happen on my other machine, which has the same version of
fipy
but a different version ofpetsc4py
/petsc
. The working machine is using3.17.4 real_he17b7f0_100 conda-forge
and the failing version is using3.18.2 real_hce8fbc3_100 conda-forge
Given that, the leak is likely to be in that build of petsc and not something fipy has control over. What os are you running? Some variant of linux, I assume? What version of Python? FWIW, this does not leak on my MacBook with Python 3.10.
scipy
throws an error (which I can submit a bug report for separately.)
If the error is RuntimeError: Factor is exactly singular
, then this is because your equation admits an infinite number of solutions. Different solvers respond differently, but ultimately, the problem is not well-defined. Add a Dirichlet constraint or a TransientTerm
.
With a Dirichlet condition, I do see a small monotonic increase in RAM usage with --scipy
from ~54 MiB to ~57 MiB. I hesitate to call this a leak without further diagnosis; it could well be that the Python garbage collector just hasn't triggered, yet. With --petsc
, I see a much smaller rise from ~54 MiB to ~54.5 MiB.
Either way, 5 GiB is an insane amount of RAM usage for this tiny problem. Do both petsc4py
/petsc
builds use this much RAM?
Given that, the leak is likely to be in that build of petsc and not something fipy has control over. What os are you running? Some variant of linux, I assume? What version of Python? FWIW, this does not leak on my MacBook with Python 3.10.
It does seem likely that the issue is in petsc
. Part of the reason that I submitted a bug here is that I don't know the internals of how FiPy
calls petsc4py
well enough to submit a detailed bug report to them.
I've attached two files below with my environments that do and don't work:
environment_leak.txt environment_no_leak.txt
Both are using python 3.10.8, and were generated from the following environment.yml
using conda
, with different versions of petsc4py
:
name: ionic
channels:
- conda-forge
- defaults
dependencies:
- python
- matplotlib
- pandas
- jupyterlab
- bokeh<3.0
- fipy
- pytables
- spyder
- dotmap
- petsc4py=3.18.2
# - petsc4py=3.17.4
The version that works only consumes a couple of KiB of ram as it runs. The version that does not work rapidly consumes memory at around 1GiB per 30 seconds
I am running this on a kubuntu 22.10
system.
For the scipy solver, the issue was that I was setting os.environ['FIPY_SOLVERS'] = 'scipy'
, but then trying to use a hardcoded term.sweep(dt=dt, solver = fipy.solvers.petsc.linearGMRESSolver.LinearGMRESSolver())
call in part of my code. I guess that once you import fipy with a particular solver, you can't then use different solver. I assume this is expected behavior and not a bug?
If I don't specify the solver explicitly when I import, then switching between solvers on the fly seems to work.
I guess that once you import fipy with a particular solver, you can't then use different solver. I assume this is expected behavior and not a bug?
I think under some circumstances, it could work, but mixing solver packages is not intended usage.
It does seem likely that the issue is in
petsc
. Part of the reason that I submitted a bug here is that I don't know the internals of howFiPy
callspetsc4py
well enough to submit a detailed bug report to them.I've attached two files below with my environments that do and don't work:
OK, thanks for this. If I can reproduce, I'll submit a ticket against petsc(4py)
or, if I can't, I'll try to tell you how to create a minimally reproducible example that you can submit.
I can, indeed, reproduce, on both debian and macOS. Something is very wrong with that petsc(4py)
build. I'll work on filing a ticket.
Thank you! I also just checked and the error reproduces with the latest build in conda-forge
of 3.18.3
Thanks for filing this ticket, @Maxwellfire! I encountered a memory leak with petsc4py 3.18 as well, but got lost in other things instead of tracking it down. I see the same leak from your MWE with petsc4py 3.18.3 (conda-forge), but no leaks with 3.16 or 3.17. This helps a lot!
While not a proper fix, this issue can be mitigated by adding a call to PETSc.garbage_cleanup()
after every iteration. E.g.,
import fipy
from petsc4py import PETSc
mesh = fipy.Grid1D(nx=100, dx=1)
var1 = fipy.CellVariable(mesh=mesh, hasOld=True, value = 1.0)
eq1 = fipy.DiffusionTerm(var = var1, coeff=1.) == 0
for n in range(1,50000):
eq1.justErrorVector()
PETSc.garbage_cleanup()
I have identified some things that we should be cleaning up, but they don't completely resolve the leak (without manual garbage_cleanup()
). I think this is not a bug in PETSc 3.18, although if it's an intentional change in behavior, I don't see anything obvious in the change log.
I realized I wasn't cleaning up the PETSc AO
application-ordering object. Once I destroy that, too, I can get memory usage back to where it was with petsc4py
3.17.4, without calling PETSc.garbage_cleanup()
.
I have filed https://gitlab.com/petsc/petsc/-/issues/1309 against petsc4py
, although I suspect they'll tell us to clean up after ourselves.
I'm trying to use a slightly unorthodox solving method that requires me to get the error vector from my set of equations repeatedly. Each time
term.justErrorVector()
is called, it consumes some memory that is never freed.Here is some simple code that reproduces the memory increase for me:
This takes a couple of minutes to run and consumes about 5GiB on my system.
I was going to try the other solver implementations (
pysparse
andscipy
), butpysparse
won't run on python3 andscipy
throws an error (which I can submit a bug report for separately.)