pythonspeed / filprofiler

A Python memory profiler for data processing and scientific computing applications
https://pythonspeed.com/products/filmemoryprofiler/
Apache License 2.0
833 stars 24 forks source link

Fil missing massive chunks of memory usage in a Python program that creates mostly just Python objects #37

Open itamarst opened 4 years ago

itamarst commented 4 years ago

Hoping for reproducer, but may have to make my own:

  1. Python 3.6 (running on Debian) loads 200MB CSV. Result uses 1GB of RAM, this shows up fine in Fil.
  2. The CSV rows are then loaded into a whole pile of objects. Per htop this is using another 3GB of RAM, but that 3GB doesn't show up in Fil.

What could cause this?

  1. Fil isn't being told about all allocations.
    1. Somehow some allocations from Python aren't reported.
      1. PYMALLOC env variable doesn't work completely.
      2. PYMALLOC env variable only works for some Python APIs.
    2. It's one of the unsupported APIs, e.g. posix_memalign.
  2. Fil is correctly tracking allocations, but memory is leaking.
    1. Due to #35.
    2. Memory fragmentation in the allocator due to many small allocations.
    3. free() isn't being called, or being called wrongly somehow.
  3. Fil is not tracking allocations correctly.
itamarst commented 4 years ago

Allocating a list of integers is successfully tracked by Fil on the Python 3.6 I have installed.

rfleschenberg commented 4 years ago

The test was done on Debian 10 with Python 3.6 installed via pyenv.

There also is the Django ORM involved here, and the whole code is wrapped in a database transaction. I think the untracked memory is already being allocated before the database is queried, but I am not 100% sure. I will try to come up with a minimal reproducible example that doesn't involve database access.

itamarst commented 4 years ago

Can't reproduce when creating a list of regular Python class instances, Fil happily tracks that too.

itamarst commented 4 years ago

@rfleschenberg if you run ldd path/to/your/python, what does it say?

rfleschenberg commented 4 years ago

@rfleschenberg if you run ldd path/to/your/python, what does it say?

ldd $(which python)
        linux-vdso.so.1 (0x00007ffd191bf000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007faf15858000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007faf15853000)
        libutil.so.1 => /lib/x86_64-linux-gnu/libutil.so.1 (0x00007faf1584e000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007faf156cb000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007faf1550a000)
        /lib64/ld-linux-x86-64.so.2 (0x00007faf15deb000)
itamarst commented 4 years ago

Probably not caused by #35, then, since libc precedes ld-linux.

itamarst commented 4 years ago

@rfleschenberg if you can't find the time for a reproducer, another useful thing would be just the list of Python packages that get imported (or even just installed) for the problem code, cause I can then audit them for which allocation APIs they use.

itamarst commented 4 years ago

Another theory: in scenarios where there are lots of small objects allocated, Fil currently has a huge amount of memory overhead. Like, memory usage can be 10× due to Fil overhead.

This was not what Fil was designed for, to be fair, but lots of small objects seems like a legitimate use case. Playing around with solutions in the branch for #43.

rfleschenberg commented 4 years ago

Fil currently has a huge amount of memory overhead. Like, memory usage can be 10× due to Fil overhead.

The code was allocating > 4 GB in production, where FIL was not involved, so I think that's probably not the issue.

if you can't find the time for a reproducer, another useful thing would be just the list of Python packages that get imported (or even just installed) for the problem code, cause I can then audit them for which allocation APIs they use

I still hope I can find the time to come up with an MRE, but due to my current work load I can't promise to do this until a certain date, sorry :-/

The issue occured in a somewhat-complex project, unfortunately. There also are bindings to some custom C++ code. However, as far as I can tell, most of the project's complexity / libraries should not be involved in the code path in question. I am attaching the full output of pip freeze just in case it is useful. The particular code path that hit the issue contains these import statements:

import logging
import re
import time

from django.conf import settings
from django.core.exceptions import ValidationError
from django.db.models.functions import Lower
from django.utils.functional import cached_property

from product.constants import TARGET_GROUPS_CHOICES
# These are Django model classes.
from product.models import (BasicColor, Brand, ProductCategoryMixin, ProductSize, ProductVariant,
                            ProductVariantTranslation, Store, StoreToProduct)
# The next two lines are pretty trivial helper functions
from product.validators import is_valid_mpn as _is_valid_mpn
from product.validators import validate_eu, validate_uk, validate_us

requirements.txt

itamarst commented 4 years ago

The code was allocating > 4 GB in production, where FIL was not involved, so I think that's probably not the issue.

When you ran Fil and got number that was much lower (1.2GB), you ran all the way to the end on full data? Or just on subset of data?

rfleschenberg commented 4 years ago

The code was allocating > 4 GB in production, where FIL was not involved, so I think that's probably not the issue.

When you ran Fil and got number that was much lower (1.2GB), you ran all the way to the end on full data? Or just on subset of data?

I ran against the full set. The OOM killer killed the process at some point. I wonder whether that has corrupted Fil's management data. In that case the "wrong" number could maybe just be a side-effect of the OOM-kill?

Afterwards I ran it against a subset. It ran to completion then. The numbers still seemed off by a similar factor (Fil showing ~1.4 GB vs htop showing > 4 GB). I cannot completely vouch for this second case, however. It's certainly possible that I did some stupid mistake.

Thank you for all of your work on this! I will try my best to provide more useful debugging information, and ideally a reproducible example. Unfortunately I can't commit to a deadline :-/ What I can promise is to keep it on my radar and do it as soon as possible. If you prefer to close the issue until it's reproducible, please do so!

itamarst commented 4 years ago

"Fil showing ~1.4 GB vs htop showing > 4 GB" could very well be a symptom of #45. Fil has a lot of overhead.

To rule this out, you could run the following experiment:

  1. Create a subset of the data, small enough that the program finishes and doesn't use too much memory (let's say 1GB?).
  2. Run the program without Fil, note maximum memory usage.
  3. Run the program with Fil.

If the issue is just Fil overhead, Fil will report same maximum memory usage as step 2 above; the actual usage will be higher due to overhead, but the report will match step 2.

itamarst commented 4 years ago

I just released v0.9.0.

  1. It's 3-4× as fast when doing large numbers of allocations.
  2. Memory overhead is a lot smaller. E.g. I saw my test script go from ~800MB RAM to ~200MB RAM, for a script that Fil reported as using ~80MB of application allocations (this script does millions of tiny allocations).

If you could rerun your code with this version, it maybe that you can actually run the script to the end in a reasonable amount of time, and not run out of memory. Which would allow for more accurate assessment of whether there's a bug or it was just the overhead I just shrunk.