jfowkes / pycutest

Python interface to CUTEst
https://jfowkes.github.io/pycutest/
GNU General Public License v3.0
28 stars 11 forks source link

Fix memory leak #43

Closed jfowkes closed 1 year ago

jfowkes commented 1 year ago

Fix memory leak reported in #42 due to not calling CUTEST_uterminate or CUTEST_cterminate.

This fix adds the __del__ finaliser to the CUTEstProblem class and uses it to call CUTEST_uterminate or CUTEST_cterminate (as appropriate) via the C interface whenever a CUTEstProblem class instance is garbage collected.

I have tested this fix and valgrind no longer detects any memory leaks in CUTEst itself.

Resolves #42

jfowkes commented 1 year ago

While this has fixed the memory leak, the interface now breaks when a problem is re-imported:

>>> import pycutest
>>> p = pycutest.import_problem('MODBEALE')
>>> del p
>>> p = pycutest.import_problem('MODBEALE')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/jfowkes/Code/pycutest/pycutest/build_interface.py", line 348, in import_problem
    return CUTEstProblem(__import__('%s.%s' % (CACHE_SUBFOLDER, problemDir), globals(), locals(), [str(problemDir)]),
  File "/home/jfowkes/Code/pycutest/pycutest/problem_class.py", line 171, in __init__
    self.init_stats = self._module.report()
Exception: Problem is not set up

since the problem init code in python_interface.py is not called on module re-import and thus the problem is not setup.

A quick and dirty fix is probably to force the problem module to reimport using importlib, something like:

import importlib
importlib.reload('%s.%s' % (CACHE_SUBFOLDER, problemDir))

however this only works for Python 3.4 onwards. @lindonroberts what are your thoughts on this?

lindonroberts commented 1 year ago

@jfowkes - I haven't tested yet, but I think your quick and dirty fix would probably work for version >=3.4 (on a related note, the official Python documentation suggests we should replace our __import__(...) calls in build_interface.import_problem with calls to importlib.import_module(...)).

For earlier Python versions apparently the imp package did the same thing as importlib.reload, so you could do import imp; imp.reload(...). It seems to be old enough, as it appears in my system's default Python 2.7 installation.

However for a more graceful fix, I think we need to call CUTEST_usetup or CUTEST_csetup again. These functions are called in each problem's __init__.py file, defined in python_interface.py. Perhaps we need to encapsulate the global variable definitions in that script in a function, then call that function to load the variables as part of the CUTEstProblem class constructor? I'm not sure how to do this properly though, the best I can think of is something like

# In PROBLEM_SUBDIRECTORY/__init__.py (defined in python_interface.py)

def load_globals():
    # Same as python_interface.py, lines 54-88
    # ...
    info=_pycutestitf._setup(efirst, lfirst, nvfirst)  # calls CUTEST_usetup or CUTEST_csetup
    # ...
    return info, n, m  # and other variables as needed

info, n, m = load_globals()

# from here, create functions as before, e.g. def getinfo()...

del os, fromDir, efirst, lfirst, nvfirst

so then we can call load_globals() in the class constructor to avoid the errors you are seeing.

I haven't tested yet, but do you think this approach might work?

jfowkes commented 1 year ago

Thanks @lindonroberts, your proposed graceful fix is the way to go! However, your proposed solution would result in CUTEST_usetup/CUTEST_csetup being called twice which seemed inefficient, so I have refactored the majority of functions from python_inteface.py into problem_class.py (where they probably should have been anyway). This results in a much neater codebase: all that python_interface.py now contains is the wrapper to CUTEST_usetup/CUTEST_csetup!

I don't think there's much point in supporting python versions <3.4 going forward so I would be keen to replace our __import__(...) calls in build_interface.import_problem with calls to importlib.import_module(...) as is recommended. Also, going forward I think we should drop Python 2.7 support so that we can merge the updates from #33 (I've already had to modify the C interface slightly to get this PR working so would be good to get it cleaned up before a new release).

lindonroberts commented 1 year ago

@jfowkes - thanks Jari, this is looking much nicer! This new format seems to work for me (and tests are passing).

For the __import__(...) calls, I agree that no longer supporting Python 2.7 makes sense. However, given this fix is for memory leaking problems, perhaps it makes sense to merge these improvements (which are still Python 2 compatible), and add a warning message that future versions will not support Python 2 (i.e. make a new version as the final version that will support Python 2)? Then, since #33 drops Python 2 support anyway, a new version associated with that release would be a good place to switch to importlib (and only support Python 3.4+ from then onwards).

What do you think? It's maybe too much work and we could just put it all together - it looks like nobody downloads the package under Python 2.7 (see these download stats).

jfowkes commented 1 year ago

@lindonroberts I agree it would be bad form to leave the last Python 2.7 release with a memory leak, but both this PR and #33 would require users to clear out their pycutest_cache, so it's probably best to only impose this on our users once? Having two breaking releases in a row is probably not the most user-friendly... (happy to leave the __import__(...) calls change to a future release as this is not a priority). Either way, I suppose this PR should probably be merged in first?

Background to the above: the terminate function added to the C interface in this PR and the renaming of the underscore functions in the C interface (so that we can access them directly from the problem class) will force users to recompile any existing problems. Similarly, for the changes in #33 e.g. returning floats instead of arrays, the NumPy include path, etc.

lindonroberts commented 1 year ago

@jfowkes - that makes sense, I don't like the idea of two breaking (and cache breaking) releases in a row, and ok given nobody uses the Python 2.7 version. We might as well fix the __import__ calls too while we're at it, make everything up-to-date.

In that case, I guess we should merge this and then #33 before a new release

jfowkes commented 1 year ago

@lindonroberts sounds good to me, shall I add the __import__ calls fix to #33?

jfowkes commented 1 year ago

@lindonroberts also realised this change has exposed getinfo() within the CUTEstProblem class -- it doesn't look like this function is actually used anywhere so I guess we should probably remove it? It looks like its presence has confused users in the past (e.g. see the last comments in #7).

yihang-gao commented 1 year ago

Hi, it is strange that I try prob = pycutest.import_problem("XXX") del prob prob = pycutest.import_problem("XXX") --- import it again

it does not release any RAM... It means the memory leak still exists...

yihang-gao commented 1 year ago

While this has fixed the memory leak, the interface now breaks when a problem is re-imported:

>>> import pycutest
>>> p = pycutest.import_problem('MODBEALE')
>>> del p
>>> p = pycutest.import_problem('MODBEALE')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/jfowkes/Code/pycutest/pycutest/build_interface.py", line 348, in import_problem
    return CUTEstProblem(__import__('%s.%s' % (CACHE_SUBFOLDER, problemDir), globals(), locals(), [str(problemDir)]),
  File "/home/jfowkes/Code/pycutest/pycutest/problem_class.py", line 171, in __init__
    self.init_stats = self._module.report()
Exception: Problem is not set up

since the problem init code in python_interface.py is not called on module re-import and thus the problem is not setup.

A quick and dirty fix is probably to force the problem module to reimport using importlib, something like:

import importlib
importlib.reload('%s.%s' % (CACHE_SUBFOLDER, problemDir))

however this only works for Python 3.4 onwards. @lindonroberts what are your thoughts on this?

Hi, I tried

problems_name = ["HS7", "HS6", "HS56", "HS48", "HS60"]
tracemalloc.start()
for name in problems_name:
    prob = pycutest.import_problem(name, drop_fixed_variables=True)
    for i in range(1000):
        x = prob.x0
        g = prob.objcons(x)

    del prob
    gc.collect()

    snapshot = tracemalloc.take_snapshot()
    display_top(snapshot)

I found that the memory occupied is increasing!!!

jfowkes commented 1 year ago

Due to the way Python works there will still be some memory taken up, but this is a) much smaller than before, and b) due to memory leaks present in NumPy and the Python C API itself, and therefore not something we can do anything about. If you run valgrind on a Python debug instance you will see that we do not leak any memory.