python / cpython

The Python programming language
https://www.python.org/
Other
60.05k stars 29.08k forks source link

Different memory/gc behavior depending on setting locals=globals #118426

Open leviska opened 2 weeks ago

leviska commented 2 weeks ago

Bug report

Bug description:

If you call PyEval_EvalCode, memory will be cleared if locals != globals, but will accumulate if locals == globals (or locals == Py_None, which is the same)

The code sample:

Code sample ```c++ #include #include #include #include #include int main() { Py_Initialize(); const char *python_code = "import psutil\n" "mem = psutil.Process().memory_info().rss / 1024 / 1024\n" "print(mem)\n" "def foo(arr):\n" " return arr * 2\n" "arr2 = foo(arr)\n"; for (int i = 0; i < 10; ++i) { PyObject *compiled_code = Py_CompileString(python_code, "memleak", Py_file_input); if (compiled_code == NULL) { PyErr_Print(); Py_Finalize(); return 1; } PyObject *dict = PyDict_New(); if (dict == NULL) { PyErr_Print(); Py_Finalize(); return 1; } PyObject *dict2 = PyDict_New(); if (dict2 == NULL) { PyErr_Print(); Py_Finalize(); return 1; } PyObject *builtins = PyEval_GetBuiltins(); if (builtins == NULL) { PyErr_Print(); Py_Finalize(); return 1; } Py_INCREF(builtins); if (PyDict_SetItemString(dict, "__builtins__", builtins) != 0) { PyErr_Print(); Py_Finalize(); return 1; } const int SIZE = 10 * 1024 * 1024 / 8; PyObject *list = PyList_New(SIZE); if (list == NULL) { PyErr_Print(); Py_Finalize(); return 1; } for (int i = 0; i < SIZE; i++) { PyObject *pyint = PyLong_FromLong(i); if (pyint == NULL) { PyErr_Print(); Py_Finalize(); return 1; } PyList_SetItem(list, i, pyint); } if (PyDict_SetItemString(dict, "arr", list) != 0) { PyErr_Print(); Py_Finalize(); return 1; } PyObject *result = PyEval_EvalCode(compiled_code, dict, dict); if (result == NULL) { PyErr_Print(); Py_Finalize(); return 1; } Py_DECREF(dict); Py_DECREF(list); Py_DECREF(dict2); Py_DECREF(builtins); Py_DECREF(result); Py_DECREF(compiled_code); } Py_Finalize(); return 0; } ```

The output of the given code on my machine:

# globals=dict, locals=dict
66.421875
136.5625
206.72265625
276.87890625
347.03515625
136.54296875
206.68359375
276.83203125
346.98046875
417.12890625

Yet, if we call PyEval_EvalCode(compiled_code, dict, dict2); (dict2, not dict), then the output becomes

# globals=dict, locals=dict2
66.421875
66.3046875
86.34765625
86.36328125
86.36328125
86.3671875
86.3671875
86.3671875
86.3671875
86.3671875

Adding

import gc
gc.collect()

At the end of the python code seems to help even in the first case, but still the memory can be larger:

# globals=dict, locals=dict, add gc
66.2109375
136.46875
136.375
136.37890625
156.3125
156.2734375
156.3515625
156.2421875
156.2734375
156.2421875

Originally I found this using pyo3 library in Rust (I have a MRE in Rust too), so I'm pretty sure, that it's not related to my poor C code.

CPython versions tested on:

3.11

Operating systems tested on:

Linux

leviska commented 2 weeks ago

I'm not sure that this is a bug, but I have absolutely no clue why the behavior is different

leviska commented 2 weeks ago

You can find repo with cmakefile and rust variant here https://github.com/leviska/cpython_mem_leak