py-pdf / fpdf2

Simple PDF generation for Python
https://py-pdf.github.io/fpdf2/
GNU Lesser General Public License v3.0
1.05k stars 241 forks source link

Memory usage keeps increasing in GitHub Actions CI pipeline #641

Open Lucas-C opened 1 year ago

Lucas-C commented 1 year ago

On September 29th of 2022, I introduced some checks in the unit tests suite to ensure memory usage does not increase: https://github.com/PyFPDF/fpdf2/commit/7fce7231547060555fadb7bbeb05ef97d6dc8f4b. Thoses checks use memunit, that itself uses memory_profiler, which then uses psutil to get the processus current memory usage.

Those were the initial memory caps:

2 months later, we had to progresively increase those thresholds, as we kept exceeding them during the tests execution in GitHub Actions CI pipelines.

Those are the actual values at the time I'm writing this:

Those thresholds have all 3 increased by more than 10MB, with 40 extra unit tests added inbetween those commits.

While I expect some variance in GitHub Actions executions, as we have no control over the execution context (the runners capabilities can change, and the load on GitHub systems can vary), I worry that this could point to some memory leak in fpdf2.

I'm opening this isue to keep track of this subject, and I'd be very happy to get some help from the fpdf2 users community about this 😊

Lucas-C commented 1 year ago

More findings: the memory usage seems somehow proportional to the number of unit tests run.

I was able to lower the limit to 85MB on my computer, when running pytest test/image/test_oversized.py test/image/test_load_image.py

But pytest test/image/test_*.py fails with the same memory cap.

andersonhc commented 1 year ago

I did some research about this before and pytest memory usage increases the more tests you have.

https://stackoverflow.com/questions/20900380/py-test-excessive-memory-usage-with-large-number-of-tests

Lucas-C commented 1 year ago

I asked the question directly on pytest bug tracker: https://github.com/pytest-dev/pytest/issues/10605

I also tried pympler to investigate memory usage, by using the following pytest_mem.py script as a wrapper for pytest:

#!/usr/bin/env python3
from psutil import Process
from pympler.muppy import get_objects, getsizeof
from pympler.summary import print_, summarize
from pympler.util.stringutils import pp
from pytest import console_main

try:
    console_main()
finally:
    print("[psutil] process RSS memory:", pp(Process().memory_info().rss))
    all_objects = get_objects()
    objs_size_sum = sum(getsizeof(obj) for obj in all_objects)
    print("[pympler/muppy] sum of objects memory size:", pp(objs_size_sum))
    print("[pympler/muppy] biggest objects summary:")
    print_(summarize(all_objects))

This was the result of calling pytest_mem.py in fpdf2 source directory:

[psutil] process memory: 200.07 MB
[pympler/muppy] sum of objects memory size:   52.80 MB
[pympler/muppy] biggest objects summary:
                types |   # objects |   total size
===================== | =========== | ============
                 dict |       48005 |     15.05 MB
                  str |       96114 |     11.91 MB
                 code |       28423 |      4.91 MB
                 type |        4789 |      4.48 MB
                 list |       24091 |      2.27 MB
                tuple |       33370 |      1.87 MB
                  set |        1932 |      1.08 MB
                  int |       18916 |    541.84 KB
          abc.ABCMeta |         477 |    490.97 KB
              weakref |        6292 |    442.41 KB
            bytearray |           3 |    419.26 KB
     fpdf.syntax.Name |        3258 |    327.77 KB
    pathlib.PosixPath |        3140 |    318.91 KB
  function (__init__) |        2091 |    277.71 KB
    getset_descriptor |        3808 |    238.00 KB

As can be seen, pympler/muppy total size of Python objects does not account for all the process memory usage though...

Another possibility could be a memory leak in Pillow

Lucas-C commented 1 year ago

Still investigating... This time I used tracemalloc to spot the biggest memory allocations:

[psutil] process memory: 241.2 MiB
Top 3 lines
#1: <frozen importlib._bootstrap_external>:640: 23.1 MiB
#2: dom/minidom.py:918: 1.9 MiB
    element._attrsNS[(attr.namespaceURI, attr.localName)] = attr
#3: dom/minidom.py:363: 1.8 MiB
    self.childNodes.append(Text())
34725 other: 41.9 MiB
Total allocated size: 68.7 MiB
Lucas-C commented 1 year ago

I also attempted to use valgrind:

$ valgrind --tool=memcheck --trace-children=yes --dsymutil=yes --leak-check=full --show-leak-kinds=all --suppressions=../cpython/Misc/valgrind-python.supp ../cpython/python.exe -m pytest
==23296== HEAP SUMMARY:
==23296==     in use at exit: 11,448,844 bytes in 10,108 blocks
==23296==   total heap usage: 1,007,531 allocs, 997,423 frees, 5,146,354,308 bytes allocated
==23296==    ...
==23296== 8,288 bytes in 2 blocks are possibly lost in loss record 472 of 499
==23296==    at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==23296==    by 0x30E7B8: _PyMem_RawMalloc (obmalloc.c:43)
==23296==    ...
==23296==    by 0x31087C: PyMem_Malloc (obmalloc.c:588)
==23296==    by 0x4AAA8D: _buffered_init (bufferedio.c:716)
==23296==    by 0x4AB392: _io_BufferedWriter___init___impl (bufferedio.c:1781)
==23296==
==23296== 8,784 bytes in 8 blocks are possibly lost in loss record 473 of 499
==23296==    at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==23296==    by 0x30E7B8: _PyMem_RawMalloc (obmalloc.c:43)
==23296==    ...
==23296==    by 0x310A9A: PyObject_Malloc (obmalloc.c:677)
==23296==    by 0x2F3F4F: clone_combined_dict_keys (dictobject.c:782)
==23296==    by 0x2F8CA7: PyDict_Copy (dictobject.c:3016)
==23296==
==23296== 12,288 bytes in 1 blocks are still reachable in loss record 474 of 499
==23296==    at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==23296==    by 0xAA404E2: xmlHashCreate (in /usr/lib/x86_64-linux-gnu/libxml2.so.2.9.10)
==23296==    by 0xAA8D93D: xmlXPathNewContext (in /usr/lib/x86_64-linux-gnu/libxml2.so.2.9.10)
==23296==    ...
==23296==    by 0xA87CD70: __pyx_pymod_exec_etree (etree.c:261026)
==23296==    by 0x308A7B: PyModule_ExecDef (moduleobject.c:424)
==23296==    by 0x418977: exec_builtin_or_dynamic (import.c:2395)
==23296==    by 0x4189D7: _imp_exec_dynamic_impl (import.c:2469)
==23296==    by 0x4189EB: _imp_exec_dynamic (import.c.h:503)
==23296==
==23296== 14,431 bytes in 14 blocks are possibly lost in loss record 475 of 499
==23296==    at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==23296==    by 0x30E7B8: _PyMem_RawMalloc (obmalloc.c:43)
==23296==    ...
==23296==    by 0x310A9A: PyObject_Malloc (obmalloc.c:677)
==23296==    by 0x324BDD: type_new_set_doc (typeobject.c:3045)
==23296==    by 0x3293A2: type_new_set_attrs (typeobject.c:3229)
==23296==
==23296== 14,832 bytes in 17 blocks are possibly lost in loss record 476 of 499
==23296==    at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==23296==    by 0x30E7B8: _PyMem_RawMalloc (obmalloc.c:43)
==23296==    ...
==23296==    by 0x31087C: PyMem_Malloc (obmalloc.c:588)
==23296==    by 0x6628BC4: prepare_s (_struct.c:1496)
==23296==    by 0x662C828: Struct_impl (_struct.c:1602)
==23296==
==23296== 17,976 bytes in 67 blocks are still reachable in loss record 477 of 499
==23296==    at 0x483DD99: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==23296==    by 0x401331C: _dl_check_map_versions (dl-version.c:274)
==23296==    by 0x40160FC: dl_open_worker (dl-open.c:577)
==23296==    by 0x4B35927: _dl_catch_exception (dl-error-skeleton.c:208)
==23296==    by 0x4015609: _dl_open (dl-open.c:837)
==23296==    by 0x487C34B: dlopen_doit (dlopen.c:66)
==23296==    by 0x4B35927: _dl_catch_exception (dl-error-skeleton.c:208)
==23296==    by 0x4B359F2: _dl_catch_error (dl-error-skeleton.c:227)
==23296==    by 0x487CB58: _dlerror_run (dlerror.c:170)
==23296==    by 0x487C3D9: dlopen@@GLIBC_2.2.5 (dlopen.c:87)
==23296==    by 0x46E6C5: _PyImport_FindSharedFuncptr (dynload_shlib.c:80)
==23296==    by 0x41E7F1: _PyImport_LoadDynamicModuleWithSpec (importdl.c:139)
==23296==
==23296== 27,557 bytes in 23 blocks are still reachable in loss record 478 of 499
==23296==    at 0x483DD99: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==23296==    by 0x400D283: _dl_new_object (dl-object.c:89)
==23296==    by 0x4006E96: _dl_map_object_from_fd (dl-load.c:997)
==23296==    by 0x400A61A: _dl_map_object (dl-load.c:2236)
==23296==    by 0x400F514: openaux (dl-deps.c:64)
==23296==    by 0x4B35927: _dl_catch_exception (dl-error-skeleton.c:208)
==23296==    by 0x400F962: _dl_map_object_deps (dl-deps.c:248)
==23296==    by 0x4015DAF: dl_open_worker (dl-open.c:571)
==23296==    by 0x4B35927: _dl_catch_exception (dl-error-skeleton.c:208)
==23296==    by 0x4015609: _dl_open (dl-open.c:837)
==23296==    by 0x487C34B: dlopen_doit (dlopen.c:66)
==23296==    by 0x4B35927: _dl_catch_exception (dl-error-skeleton.c:208)
==23296==
==23296== 47,712 bytes in 33 blocks are possibly lost in loss record 479 of 499
==23296==    at 0x483DFAF: realloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==23296==    by 0x30E806: _PyMem_RawRealloc (obmalloc.c:65)
==23296==    ...
==23296==    by 0x310B01: PyObject_Realloc (obmalloc.c:699)
==23296==    by 0x34D4AD: resize_compact (unicodeobject.c:957)
==23296==    by 0x36C9ED: _PyUnicodeWriter_Finish (unicodeobject.c:13117)
==23296==    by 0x36E14F: unicode_decode_utf8 (unicodeobject.c:4644)
==23296==    by 0x36E458: PyUnicode_DecodeUTF8Stateful (unicodeobject.c:4660)
==23296==
==23296== 51,064 bytes in 15 blocks are indirectly lost in loss record 480 of 499
==23296==    at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==23296==    by 0x30E7B8: _PyMem_RawMalloc (obmalloc.c:43)
==23296==    ...
==23296==    by 0x310A9A: PyObject_Malloc (obmalloc.c:677)
==23296==    by 0x34F586: PyUnicode_New (unicodeobject.c:1188)
==23296==    by 0x34F912: _PyUnicode_FromUCS1 (unicodeobject.c:1979)
==23296==
==23296== 60,728 bytes in 47 blocks are still reachable in loss record 481 of 499
==23296==    at 0x483DD99: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==23296==    by 0x400D283: _dl_new_object (dl-object.c:89)
==23296==    by 0x4006E96: _dl_map_object_from_fd (dl-load.c:997)
==23296==    by 0x400A61A: _dl_map_object (dl-load.c:2236)
==23296==    by 0x4015D46: dl_open_worker (dl-open.c:513)
==23296==    by 0x4B35927: _dl_catch_exception (dl-error-skeleton.c:208)
==23296==    by 0x4015609: _dl_open (dl-open.c:837)
==23296==    by 0x487C34B: dlopen_doit (dlopen.c:66)
==23296==    by 0x4B35927: _dl_catch_exception (dl-error-skeleton.c:208)
==23296==    by 0x4B359F2: _dl_catch_error (dl-error-skeleton.c:227)
==23296==    by 0x487CB58: _dlerror_run (dlerror.c:170)
==23296==    by 0x487C3D9: dlopen@@GLIBC_2.2.5 (dlopen.c:87)
==23296==
==23296== 69,906 bytes in 47 blocks are possibly lost in loss record 482 of 499
==23296==    at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==23296==    by 0x30E7B8: _PyMem_RawMalloc (obmalloc.c:43)
==23296==    ...
==23296==    by 0x310A9A: PyObject_Malloc (obmalloc.c:677)
==23296==    by 0x34F586: PyUnicode_New (unicodeobject.c:1188)
==23296==    by 0x36CB3B: unicode_decode_utf8 (unicodeobject.c:4527)
==23296==
==23296== 72,704 bytes in 1 blocks are still reachable in loss record 483 of 499
==23296==    at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==23296==    by 0xCA45A99: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28)
==23296==    by 0x4011B99: call_init.part.0 (dl-init.c:72)
==23296==    by 0x4011CA0: call_init (dl-init.c:30)
==23296==    by 0x4011CA0: _dl_init (dl-init.c:119)
==23296==    by 0x4B35984: _dl_catch_exception (dl-error-skeleton.c:182)
==23296==    by 0x40160CE: dl_open_worker (dl-open.c:758)
==23296==    by 0x4B35927: _dl_catch_exception (dl-error-skeleton.c:208)
==23296==    by 0x4015609: _dl_open (dl-open.c:837)
==23296==    by 0x487C34B: dlopen_doit (dlopen.c:66)
==23296==    by 0x4B35927: _dl_catch_exception (dl-error-skeleton.c:208)
==23296==    by 0x4B359F2: _dl_catch_error (dl-error-skeleton.c:227)
==23296==    by 0x487CB58: _dlerror_run (dlerror.c:170)
==23296==
==23296== 95,872 bytes in 98 blocks are possibly lost in loss record 484 of 499
==23296==    at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==23296==    by 0x30E7B8: _PyMem_RawMalloc (obmalloc.c:43)
==23296==    ...
==23296==    by 0x310A9A: PyObject_Malloc (obmalloc.c:677)
==23296==    by 0x4735D4: gc_alloc (gcmodule.c:2306)
==23296==    by 0x47373C: _PyObject_GC_NewVar (gcmodule.c:2340)
==23296==
==23296== 116,688 bytes in 143 blocks are possibly lost in loss record 485 of 499
==23296==    at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==23296==    ...
==23296==    by 0x310A9A: PyObject_Malloc (obmalloc.c:677)
==23296==    by 0x2EC29C: new_keys_object (dictobject.c:627)
==23296==    by 0x2F4461: _PyDict_NewKeysForClass (dictobject.c:5258)
==23296==
==23296== 131,096 bytes in 1 blocks are possibly lost in loss record 486 of 499
==23296==    at 0x483DD99: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==23296==    by 0x30E7E7: _PyMem_RawCalloc (obmalloc.c:57)
==23296==    by 0x30E957: _PyMem_DebugRawAlloc (obmalloc.c:1835)
==23296==    by 0x30EA48: _PyMem_DebugRawCalloc (obmalloc.c:1880)
==23296==    by 0x30FFFD: PyMem_RawCalloc (obmalloc.c:560)
==23296==    by 0x3100CB: arena_map_get (obmalloc.c:798)
==23296==    by 0x310116: arena_map_mark_used (obmalloc.c:841)
==23296==    by 0x3112E1: new_arena (obmalloc.c:972)
==23296==    by 0x311665: allocate_from_new_pool (obmalloc.c:1154)
==23296==    by 0x311BC5: pymalloc_alloc (obmalloc.c:1318)
==23296==    by 0x311BC5: _PyObject_Malloc (obmalloc.c:1328)
==23296==    by 0x30E9CF: _PyMem_DebugRawAlloc (obmalloc.c:1838)
==23296==    by 0x30EA0B: _PyMem_DebugRawMalloc (obmalloc.c:1871)
==23296==
==23296== 187,456 bytes in 140 blocks are possibly lost in loss record 487 of 499
==23296==    at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==23296==    by 0x30E7B8: _PyMem_RawMalloc (obmalloc.c:43)
==23296==    ...
==23296==    by 0x31087C: PyMem_Malloc (obmalloc.c:588)
==23296==    by 0x314EEE: set_table_resize (setobject.c:274)
==23296==    by 0x3160DB: set_add_entry (setobject.c:177)
==23296==
==23296== 262,168 bytes in 1 blocks are possibly lost in loss record 488 of 499
==23296==    at 0x483DD99: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==23296==    by 0x30E7E7: _PyMem_RawCalloc (obmalloc.c:57)
==23296==    by 0x30E957: _PyMem_DebugRawAlloc (obmalloc.c:1835)
==23296==    by 0x30EA48: _PyMem_DebugRawCalloc (obmalloc.c:1880)
==23296==    by 0x30FFFD: PyMem_RawCalloc (obmalloc.c:560)
==23296==    by 0x31008F: arena_map_get (obmalloc.c:786)
==23296==    by 0x310116: arena_map_mark_used (obmalloc.c:841)
==23296==    by 0x3112E1: new_arena (obmalloc.c:972)
==23296==    by 0x311665: allocate_from_new_pool (obmalloc.c:1154)
==23296==    by 0x311BC5: pymalloc_alloc (obmalloc.c:1318)
==23296==    by 0x311BC5: _PyObject_Malloc (obmalloc.c:1328)
==23296==    by 0x30E9CF: _PyMem_DebugRawAlloc (obmalloc.c:1838)
==23296==    by 0x30EA0B: _PyMem_DebugRawMalloc (obmalloc.c:1871)
==23296==
==23296== 270,928 bytes in 263 blocks are possibly lost in loss record 489 of 499
==23296==    at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==23296==    by 0x30E7B8: _PyMem_RawMalloc (obmalloc.c:43)
==23296==    ...
==23296==    by 0x310A9A: PyObject_Malloc (obmalloc.c:677)
==23296==    by 0x32BF03: _PyType_AllocNoTrack (typeobject.c:1296)
==23296==    by 0x32C042: PyType_GenericAlloc (typeobject.c:1320)
==23296==
==23296== 271,128 bytes in 133 blocks are possibly lost in loss record 490 of 499
==23296==    at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==23296==    by 0x30E7B8: _PyMem_RawMalloc (obmalloc.c:43)
==23296==    ...
==23296==    by 0x310A9A: PyObject_Malloc (obmalloc.c:677)
==23296==    by 0x2EC29C: new_keys_object (dictobject.c:627)
==23296==    by 0x2F0D63: dictresize (dictobject.c:1424)
==23296==
==23296== 414,472 bytes in 311 blocks are indirectly lost in loss record 491 of 499
==23296==    at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==23296==    by 0x30E7B8: _PyMem_RawMalloc (obmalloc.c:43)
==23296==    ...
==23296==    by 0x310A9A: PyObject_Malloc (obmalloc.c:677)
==23296==    by 0x4735D4: gc_alloc (gcmodule.c:2306)
==23296==    by 0x47373C: _PyObject_GC_NewVar (gcmodule.c:2340)
==23296==
==23296== 442,372 bytes in 262 blocks are indirectly lost in loss record 492 of 499
==23296==    at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==23296==    by 0x30E7B8: _PyMem_RawMalloc (obmalloc.c:43)
==23296==    ...
==23296==    by 0x310A9A: PyObject_Malloc (obmalloc.c:677)
==23296==    by 0x295669: _PyBytes_FromSize (bytesobject.c:102)
==23296==    by 0x295C5C: PyBytes_FromStringAndSize (bytesobject.c:134)
==23296==
==23296== 512,216 bytes in 504 blocks are indirectly lost in loss record 493 of 499
==23296==    at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==23296==    by 0x30E7B8: _PyMem_RawMalloc (obmalloc.c:43)
==23296==    ...
==23296==    by 0x310A9A: PyObject_Malloc (obmalloc.c:677)
==23296==    by 0x30AC3D: _PyObject_NewVar (object.c:188)
==23296==    by 0x2AC0F9: _PyCode_New (codeobject.c:559)
==23296==
==23296== 654,070 bytes in 451 blocks are indirectly lost in loss record 494 of 499
==23296==    at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==23296==    by 0x30E7B8: _PyMem_RawMalloc (obmalloc.c:43)
==23296==    ...
==23296==    by 0x31087C: PyMem_Malloc (obmalloc.c:588)
==23296==    by 0x2ADA20: _PyCode_CreateLineArray (codeobject.c:818)
==23296==    by 0x3C68D7: _PyCode_InitLineArray (pycore_code.h:449)
==23296==    by 0x3C68D7: call_trace (ceval.c:2502)
==23296==
==23296== 732,376 bytes in 554 blocks are possibly lost in loss record 495 of 499
==23296==    at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==23296==    by 0x30E7B8: _PyMem_RawMalloc (obmalloc.c:43)
==23296==    ...
==23296==    by 0x31087C: PyMem_Malloc (obmalloc.c:588)
==23296==    by 0x2ADA20: _PyCode_CreateLineArray (codeobject.c:818)
==23296==    by 0x3C68D7: _PyCode_InitLineArray (pycore_code.h:449)
==23296==    by 0x3C68D7: call_trace (ceval.c:2502)
==23296==
==23296== 1,060,015 bytes in 969 blocks are possibly lost in loss record 496 of 499
==23296==    at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==23296==    by 0x30E7B8: _PyMem_RawMalloc (obmalloc.c:43)
==23296==    ...
==23296==    by 0x310A9A: PyObject_Malloc (obmalloc.c:677)
==23296==    by 0x34F586: PyUnicode_New (unicodeobject.c:1188)
==23296==    by 0x34F912: _PyUnicode_FromUCS1 (unicodeobject.c:1979)
==23296==
==23296== 1,914,905 bytes in 1,583 blocks are possibly lost in loss record 497 of 499
==23296==    at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==23296==    by 0x30E7B8: _PyMem_RawMalloc (obmalloc.c:43)
==23296==    ...
==23296==    by 0x310A9A: PyObject_Malloc (obmalloc.c:677)
==23296==    by 0x295669: _PyBytes_FromSize (bytesobject.c:102)
==23296==    by 0x295C5C: PyBytes_FromStringAndSize (bytesobject.c:134)
==23296==
==23296== 2,853,998 (779,216 direct, 2,074,782 indirect) bytes in 640 blocks are definitely lost in loss record 498 of 499
==23296==    at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==23296==    by 0x30E7B8: _PyMem_RawMalloc (obmalloc.c:43)
==23296==    ...
==23296==    by 0x310A9A: PyObject_Malloc (obmalloc.c:677)
==23296==    by 0x30AC3D: _PyObject_NewVar (object.c:188)
==23296==    by 0x2AC0F9: _PyCode_New (codeobject.c:559)
==23296==
==23296== 3,121,288 bytes in 3,100 blocks are possibly lost in loss record 499 of 499
==23296==    at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==23296==    by 0x30E7B8: _PyMem_RawMalloc (obmalloc.c:43)
==23296==    ...
==23296==    by 0x310A9A: PyObject_Malloc (obmalloc.c:677)
==23296==    by 0x30AC3D: _PyObject_NewVar (object.c:188)
==23296==    by 0x2AC0F9: _PyCode_New (codeobject.c:559)
==23296==
==23296== LEAK SUMMARY:
==23296==    definitely lost: 780,385 bytes in 672 blocks
==23296==    indirectly lost: 2,080,777 bytes in 1,654 blocks
==23296==      possibly lost: 8,362,790 bytes in 7,132 blocks
==23296==    still reachable: 224,892 bytes in 650 blocks
Lucas-C commented 1 year ago

I tried using scalene, but it was filling /tmp with 30GB+ of files in my WSL, so I did not get far before dropping it as a profiler.

I also tried to combining PyTest fixtures with objgraph diffing, by putting this in conftest.py:

import objgraph
@pytest.fixture(scope="module", autouse=True)
def trac_memory_growth():
    objgraph.show_growth(limit=10, shortnames=False)  # starts with a gc.collect()

Output:

test/test_perfs.py
builtins.list                  13301        +7
builtins.dict                  25204        +5
_pytest.reports.TestReport       108        +3
builtins.function              25990        +2
builtins.cell                   3389        +2
builtins.tuple                 17609        +1
builtins.method                 1061        +1
functools.partial                 47        +1
_pytest.fixtures.SubRequest       59        +1
...
test/encryption/test_encryption.py
builtins.list                          15565     +2141
_pytest.reports.TestReport              1227     +1068
builtins.dict                          26366     +1038
_pytest.fixtures.SubRequest              219      +138
builtins.weakref                        5687      +127
builtins.builtin_function_or_method     1696      +122
builtins.cell                           3503       +94
builtins.tuple                         17688       +49
builtins.function                      26144       +47
functools.partial                        104       +46
...
test/image/png_images/test_png_file.py
builtins.function                              26984      +309
builtins.dict                                  27838      +286
builtins.tuple                                 18224      +188
builtins.list                                  20750       +99
builtins.weakref                                5966       +66
builtins.type                                   4021       +58
builtins.module                                 1462       +42
_frozen_importlib.ModuleSpec                    1453       +42
_frozen_importlib_external.SourceFileLoader     1264       +40
_pytest.reports.TestReport                      1632       +39

The only non-builtins objects that regularly grow in number are _pytest.reports.TestReport & _pytest.fixtures.SubRequest. Apart from that, I haven't found much so far...

Lucas-C commented 1 year ago

Another interesting approach, using the following code in conftest.py:

import gc, linecache, tracemalloc

# Enabling this check creates an increase in memory usage,
# so we require an opt-in through a CLI argument:
def pytest_addoption(parser):
    parser.addoption("--trace-malloc", action="store_true", help="Trace main memory allocations differences during the whole execution")

@pytest.fixture(scope="session", autouse=True)
def trace_malloc(request):
    if not request.config.getoption("trace_malloc"):
        yield
        return
    capmanager = request.config.pluginmanager.getplugin("capturemanager")
    gc.collect()
    tracemalloc.start()
    snapshot1 = tracemalloc.take_snapshot().filter_traces((
            tracemalloc.Filter(False, linecache.__file__),
            tracemalloc.Filter(False, tracemalloc.__file__),
        ))
    yield
    gc.collect()
    snapshot2 = tracemalloc.take_snapshot().filter_traces((
            tracemalloc.Filter(False, linecache.__file__),
            tracemalloc.Filter(False, tracemalloc.__file__),
        ))
    top_stats = snapshot2.compare_to(snapshot1, 'lineno')
    with capmanager.global_and_fixture_disabled():
        print("[ Top 10 differences ]")
        for stat in top_stats[:10]:
            print(stat)

Output:

[ Top 10 differences ]
<frozen importlib._bootstrap_external>:640: size=1623 KiB (+1623 KiB), count=16460 (+16460), average=101 B
/.../site-packages/_pytest/reports.py:321: size=686 KiB (+686 KiB), count=4802 (+4802), average=146 B
/.../site-packages/_pytest/fixtures.py:766: size=504 KiB (+504 KiB), count=1743 (+1743), average=296 B
/.../site-packages/fontTools/agl.py:5085: size=482 KiB (+482 KiB), count=12702 (+12702), average=39 B
/.../site-packages/_pytest/reports.py:270: size=406 KiB (+406 KiB), count=5707 (+5707), average=73 B
/.../site-packages/_pytest/python.py:195: size=315 KiB (+315 KiB), count=538 (+538), average=600 B
/.../site-packages/cryptography/hazmat/bindings/openssl/binding.py:112: size=295 KiB (+295 KiB), count=2027 (+2027), average=149 B
/.../site-packages/_pytest/fixtures.py:684: size=271 KiB (+271 KiB), count=2879 (+2879), average=96 B
/.../site-packages/fontTools/agl.py:5086: size=251 KiB (+251 KiB), count=4229 (+4229), average=61 B
/.../site-packages/_pytest/fixtures.py:759: size=173 KiB (+173 KiB), count=2268 (+2268), average=78 B

The lines related to fontTools.agl are due to LEGACY_AGL2UV, which seems to be the biggest dict allocated during exectution.

Again it seems to point towards PyTest and notably its TestReport class. When summing up all the PyTest-related memory increases from this report, I get 2355 KiB for the whole session. Hence, those memory allocations cannot be responsible for the global memory increase I initially noticed (around 1MB per 4 unit tests)...

Lucas-C commented 1 year ago

For reference:

$ pytest --final-rss-usage test/test_add_page.py
collected 4 items -> [psutil] process RSS memory:   74.50 MB
$ pytest --final-rss-usage test/image/
collected 42 items -> [psutil] process RSS memory:   95.16 MB
$ pytest --final-rss-usage
collected 1055 items -> [psutil] process RSS memory:  174.29 MB
Lucas-C commented 1 year ago

New lead, regarding fonttools:

import gc
from fontTools.ttLib import TTFont
from psutil import Process

def print_mem_usage(prefix):
    rss_in_mib = Process().memory_info().rss / 1024 / 1024
    print(f"[psutil] {prefix} process memory (RSS) usage: {rss_in_mib:.1f} MiB")

print_mem_usage("Initial")
TTFont("test/fonts/NotoColorEmoji.ttf")
TTFont("test/fonts/NotoColorEmoji.ttf")
gc.collect()
print_mem_usage("Final")

Output:

[psutil] Initial process memory (RSS) usage: 11.1 MiB
[psutil] Final process memory (RSS) usage: 21.1 MiB

This memory "leak" disappears if we pass lazy=True to TTFont constructor

Lucas-C commented 1 year ago

I noticed that just importing the memunit module creates an increase in RSS memory usage between 15MiB to 25MiB, due to memory_profiler:

from psutil import Process

rss_in_mib = Process().memory_info().rss / 1024 / 1024
print(f"[psutil] Initial process RSS memory usage: {rss_in_mib:.1f} MiB")

import memory_profiler

rss_in_mib = Process().memory_info().rss / 1024 / 1024
print(f"[psutil] Final process RSS memory usage: {rss_in_mib:.1f} MiB")

Output:

[psutil] Initial process RSS memory usage: 9.2 MiB
[psutil] Final process RSS memory usage: 35.4 MiB

Sadly, given the number of unanswered issues on https://github.com/pythonprofilers/memory_profiler, I fear that this library is not much maintained anymore 😢 A related issue with memory_profiler leaking itself has been reported there: https://github.com/pythonprofilers/memory_profiler/issues/332

Lucas-C commented 1 year ago

Based on the latest measurements on my laptop (with memunit checks removed), the memory increase is around +1 MiB per 10 unit tests run:

$ pytest -vv --final-rss-usage test/test_add_page.py
[psutil] Final process RSS memory usage: 53.7 MiB
========================================================= 4 passed in 4.57s

$ pytest -vv --final-rss-usage test/image/
[psutil] Final process RSS memory usage: 72.9 MiB
================================================== 63 passed in 25.31s

$ pytest -vv --final-rss-usage test/
[psutil] Final process RSS memory usage: 166.9 MiB
================================= 1 failed, 1068 passed in 225.48s

This ratio is consistent with my similar observations last month.

My current best lead : I suspect a problem related with memory usage of inserted images: just the test_share_images_cache unit test creates a memory increase of 7MiB to 11MiB, when run in isolation.

_Note: I checked how using an older version of Pillow would have an impact on this memory usage, and going back as far as Pillow==7.2.0 (june 2020), the RSS memory usage when running test_share_images_cache stays the same_

_Note: while experimenting, I tested 2 LD_PRELOAD-based tools: libleak, which did not spot any memory leak, and jemalloc which did not improve the overall memory usage_

Lucas-C commented 1 year ago

I introduced some more memory metrics in https://github.com/PyFPDF/fpdf2/pull/721

Based on this recent tests suite execution: https://github.com/PyFPDF/fpdf2/actions/runs/4532166384/jobs/7983211542?pr=703 ...where I added calls to print_mem_usage() inbetween all test files processing, there is a big RSS memory usage bump (>30 MiB) when executing test_charmap_first_999_chars[TwitterEmoji.ttf]

I digged a little deeper by inserting calls to print_mem_usage(), while calling pytest --trace-memory-usage -vv test/fonts/test_charmap.py -k TwitterEmoji --capture=no:

Pre-FPDF.add_font():                      | Process RSS: 64.2 MiB
Post-FPDF.add_font():                     | Process RSS: 65.4 MiB
Pre-OutputProducer._add_fonts():          | Process RSS: 65.5 MiB
  Pre-Subsetter.subset():                 | Process RSS: 65.6 MiB
    Pre-_prune_pre_subset():              | Process RSS: 65.7 MiB
    Pre-_closure_glyphs():                | Process RSS: 72.2 MiB
    Pre-_subset_glyphs():                 | Process RSS: 72.9 MiB
    Pre-_prune_post_subset():             | Process RSS: 90.7 MiB
  Post-Subsetter.subset():                | Process RSS: 90.7 MiB
  Pre-_tt_font_widths():                  | Process RSS: 90.9 MiB
  Post-_tt_font_widths():                 | Process RSS: 99.4 MiB

...but it is not that surprising after all, given that TwitterEmoji.ttf has a file size of 14 MiB. Allocating 2.5 times this amount in RAM to parse this font is not shocking. Still, I opened https://github.com/fonttools/fonttools/issues/3066 to ask the fonttools project about it.

Lucas-C commented 1 year ago

Note: in PR #703, I changed the behaviour of the ensure_rss_memory_below(max_in_mib) test decorator: it used to check that the current process RSS memory did not go over the given limit after a given test has executed. But this was prone to changes due to memory increases in unrelated parts of the code / tests suite. So now, ensure_rss_memory_below(max_in_mib) performs a difference of RSS memory usage before and after executing the unit test, and check that this difference does not go over the given threshold.

The RSS memory metric is slitghly volatile, and definitely not determnistic, so this is far from perfect / stable, but the new behaviour is still better than the previous one 😊