Mem Usage and Increment odd behaviour

Hi @fabianp , it has been ages, I hope you're well. I'm seeing a bug related to #234.

I'm updating my High Performance Python book for the 2nd edition. Compared to the first edition's report of memory_profiler's behaviour, the Increment is definitely looking odd.

I'm using a fresh install of memory_profiler, psutil and Python 3.7 in Anaconda on Linux.

I have two functions, I've profiled them each separately and once together. They have similar (but not identical) outputs, suggesting that the comment by @soundgnome in #234 about nested calls may not be so clear cut.

I'll lay out the 3 scenarios in case you have thoughts. This is a Julia set demo from my Profiling chapter. An outer function prepares x & y arrays of complex coordinates which consumes RAM, an inner function then iterates over these, calculating the Julia set in a cpu-bound scenario.

Scenario 1 - just profiling the inner function calculate_z_serial_purepython:

$ python -m memory_profiler julia1_memoryprofiler.py 
Length of x: 1000
Total elements: 1000000
calculate_z_serial_purepython took 6689.149292945862 seconds
Filename: julia1_memoryprofiler.py

Line #    Mem usage    Increment   Line Contents
================================================
     9  126.363 MiB  126.363 MiB   @profile
    10                             def calculate_z_serial_purepython(maxiter, zs, cs):
    11                                 """Calculate output list using Julia update rule"""
    12  133.973 MiB    7.609 MiB       output = [0] * len(zs)
    13  136.988 MiB    0.000 MiB       for i in range(len(zs)):
    14  136.988 MiB    0.000 MiB           n = 0
    15  136.988 MiB    0.000 MiB           z = zs[i]
    16  136.988 MiB    0.000 MiB           c = cs[i]
    17  136.988 MiB    0.258 MiB           while n < maxiter and abs(z) < 2:
    18  136.988 MiB    0.000 MiB               z = z * z + c
    19  136.988 MiB    0.000 MiB               n += 1
    20  136.988 MiB    0.000 MiB           output[i] = n
    21  136.988 MiB    0.000 MiB       return output

In the earlier edition with Python 2.7 line 12 cost +7MB, line 13 cost +32MB (due I expect to range not being a generator in Python 2.7 and so making a full list which we avoid now with Python 3.x). Line 17 costs +0.2MB, previously it cost 0MB (but this might have been masked by the process' overall growth due to the range statement?).

Scenario 2 - profiling the outer function calc_pure_python:

$ python -m memory_profiler julia1_memoryprofiler.py 
Length of x: 1000
Total elements: 1000000
calculate_z_serial_purepython took 94.99498796463013 seconds
Filename: julia1_memoryprofiler.py

Line #    Mem usage    Increment   Line Contents
================================================
    24   48.113 MiB   48.113 MiB   @profile
    25                             def calc_pure_python(draw_output, desired_width, max_iterations):
    26                                 """Create a list of complex co-ordinates (zs) and complex parameters (cs), build Julia set and display"""
    27   48.113 MiB    0.000 MiB       x_step = (x2 - x1) / desired_width
    28   48.113 MiB    0.000 MiB       y_step = (y1 - y2) / desired_width
    29   48.113 MiB    0.000 MiB       x = []
    30   48.113 MiB    0.000 MiB       y = []
    31   48.113 MiB    0.000 MiB       ycoord = y2
    32   48.113 MiB    0.000 MiB       while ycoord > y1:
    33   48.113 MiB    0.000 MiB           y.append(ycoord)
    34   48.113 MiB    0.000 MiB           ycoord += y_step
    35   48.113 MiB    0.000 MiB       xcoord = x1
    36   48.113 MiB    0.000 MiB       while xcoord < x2:
    37   48.113 MiB    0.000 MiB           x.append(xcoord)
    38   48.113 MiB    0.000 MiB           xcoord += x_step
    39                                 # set width and height to the generated pixel counts, rather than the
    40                                 # pre-rounding desired width and height
    41                                 # build a list of co-ordinates and the initial condition for each cell.
    42                                 # Note that our initial condition is a constant and could easily be removed,
    43                                 # we use it to simulate a real-world scenario with several inputs to our function
    44   48.113 MiB    0.000 MiB       zs = []
    45   48.113 MiB    0.000 MiB       cs = []
    46  125.961 MiB    0.000 MiB       for ycoord in y:
    47  125.961 MiB    0.258 MiB           for xcoord in x:
    48  125.961 MiB    0.512 MiB               zs.append(complex(xcoord, ycoord))
    49  125.961 MiB    0.512 MiB               cs.append(complex(c_real, c_imag))
    50                             
    51  125.961 MiB    0.000 MiB       print("Length of x:", len(x))
    52  125.961 MiB    0.000 MiB       print("Total elements:", len(zs))
    53  125.961 MiB    0.000 MiB       start_time = time.time()
    54  136.609 MiB   10.648 MiB       output = calculate_z_serial_purepython(max_iterations, zs, cs)
    55  136.609 MiB    0.000 MiB       end_time = time.time()
    56  136.609 MiB    0.000 MiB       secs = end_time - start_time
    57  136.609 MiB    0.000 MiB       print(calculate_z_serial_purepython.__name__ + " took", secs, "seconds")
    58                             
    59  136.609 MiB    0.000 MiB       assert sum(output) == 33219980  # this sum is expected for 1000^2 grid with 300 iterations

In the earlier edition with Python 2.7 line 46 above had cost +79MB, here the Increment is 0MB and the few following lines cost +0.2+0.5+0.5==+1.2MB. The Mem Usage at line 46 suggests a +77MB change.

The Generation of output in line 54 here costs +10MB, before it cost +32MB.

Scenario 3 - Combined profiling of both functions:

calculate_z_serial_purepython took 6658.487464427948 seconds
Filename: julia1_memoryprofiler.py

Line #    Mem usage    Increment   Line Contents
================================================
     9  126.168 MiB  126.168 MiB   @profile
    10                             def calculate_z_serial_purepython(maxiter, zs, cs):
    11                                 """Calculate output list using Julia update rule"""
    12  133.625 MiB    7.457 MiB       output = [0] * len(zs)
    13  136.816 MiB    0.000 MiB       for i in range(len(zs)):
    14  136.816 MiB    0.000 MiB           n = 0
    15  136.816 MiB    0.000 MiB           z = zs[i]
    16  136.816 MiB    0.000 MiB           c = cs[i]
    17  136.816 MiB    0.258 MiB           while n < maxiter and abs(z) < 2:
    18  136.816 MiB    0.000 MiB               z = z * z + c
    19  136.816 MiB    0.000 MiB               n += 1
    20  136.816 MiB    0.000 MiB           output[i] = n
    21  136.816 MiB    0.000 MiB       return output

Filename: julia1_memoryprofiler.py

Line #    Mem usage    Increment   Line Contents
================================================
    24   48.273 MiB   48.273 MiB   @profile
    25                             def calc_pure_python(draw_output, desired_width, max_iterations):
    26                                 """Create a list of complex co-ordinates (zs) and complex parameters (cs), build Julia set and display"""
    27   48.273 MiB    0.000 MiB       x_step = (x2 - x1) / desired_width
    28   48.273 MiB    0.000 MiB       y_step = (y1 - y2) / desired_width
    29   48.273 MiB    0.000 MiB       x = []
    30   48.273 MiB    0.000 MiB       y = []
    31   48.273 MiB    0.000 MiB       ycoord = y2
    32   48.273 MiB    0.000 MiB       while ycoord > y1:
    33   48.273 MiB    0.000 MiB           y.append(ycoord)
    34   48.273 MiB    0.000 MiB           ycoord += y_step
    35   48.273 MiB    0.000 MiB       xcoord = x1
    36   48.273 MiB    0.000 MiB       while xcoord < x2:
    37   48.273 MiB    0.000 MiB           x.append(xcoord)
    38   48.273 MiB    0.000 MiB           xcoord += x_step
    39                                 # set width and height to the generated pixel counts, rather than the
    40                                 # pre-rounding desired width and height
    41                                 # build a list of co-ordinates and the initial condition for each cell.
    42                                 # Note that our initial condition is a constant and could easily be removed,
    43                                 # we use it to simulate a real-world scenario with several inputs to our function
    44   48.273 MiB    0.000 MiB       zs = []
    45   48.273 MiB    0.000 MiB       cs = []
    46  126.168 MiB    0.000 MiB       for ycoord in y:
    47  126.168 MiB    0.258 MiB           for xcoord in x:
    48  126.168 MiB    0.512 MiB               zs.append(complex(xcoord, ycoord))
    49  126.168 MiB    0.512 MiB               cs.append(complex(c_real, c_imag))
    50                             
    51  126.168 MiB    0.000 MiB       print("Length of x:", len(x))
    52  126.168 MiB    0.000 MiB       print("Total elements:", len(zs))
    53  126.168 MiB    0.000 MiB       start_time = time.time()
    54  136.816 MiB  136.816 MiB       output = calculate_z_serial_purepython(max_iterations, zs, cs)
    55  136.816 MiB    0.000 MiB       end_time = time.time()
    56  136.816 MiB    0.000 MiB       secs = end_time - start_time
    57  136.816 MiB    0.000 MiB       print(calculate_z_serial_purepython.__name__ + " took", secs, "seconds")
    58                             
    59  136.816 MiB    0.000 MiB       assert sum(output) == 33219980  # this sum is expected for 1000^2 grid with 300 iterations

In the above we see +136MB in line 54 (in Scenario 2 above it grew by +10MB), in the Python 2.7 version it grew by +32MB. The Mem Usage change at line 54 is +10MB.

In all 3 scenarios we see evidence that the Increment column is not matching to the Mem Usage column, so the Increment column is looking pretty unhelpful.

The Mem Usage column still looks correct and will have obvious value in a memory-diagnosis exercise so I can still use this in the book, but I'm wondering if we have an explanation for the odd Increment behaviour?

Has Python 3.x changed its default memory allocation behaviour for lists? I'm pretty sure there's no clever unboxing or other clever techniques (unlike in PyPy) in CPython.

Cheers (and I hope you're well!), Ian.

On the README there's an example snippet of code, the current behaviour is a little different to what's documented in the README:

Line #    Mem usage    Increment   Line Contents
================================================
     1   49.043 MiB   49.043 MiB   @profile
     2                             def my_func():
     3   56.453 MiB    7.410 MiB       a = [1] * (10 ** 6)
     4  209.078 MiB  152.625 MiB       b = [2] * (2 * 10 ** 7)
     5   56.676 MiB    0.000 MiB       del b
     6   56.676 MiB    0.000 MiB       return a

In the above lines 3 & 4 are the same but line 5 with del does not track the decrease in RAM used (in the README this decrease is tracked.

I'll tentatively propose that there's a problem tracking pure-Python code vs numpy code. I've written a pair of functions which create a 10,000,000 element list or array and then create a second (the first is ones, the second performs += 1). We'd expect both the Python version and the numpy version to allocate the same RAM for the first and second operations.

From myfn_np received back (10000000,)
From myfn_py received back 10000000
Filename: bug_memory_profiler.py

Line #    Mem usage    Increment   Line Contents
================================================
    12  148.418 MiB  148.418 MiB   @profile
    13                             def myfn_py():
    14  224.457 MiB   76.039 MiB       ones = [1] * NBR_ITEMS
    15  301.574 MiB    0.773 MiB       twos = [n+1 for n in ones]
    16  301.574 MiB    0.000 MiB       return twos

Filename: bug_memory_profiler.py

Line #    Mem usage    Increment   Line Contents
================================================
    18   70.938 MiB   70.938 MiB   @profile
    19                             def myfn_np():
    20                                 # both lines should generate the same memory increase
    21                                 # as twos is a new object, the same size as ones
    22  147.168 MiB   76.230 MiB       ones = np.ones(NBR_ITEMS)
    23  224.570 MiB   77.402 MiB       twos = ones+1
    24  224.570 MiB    0.000 MiB       return twos

What I'm seeing is that the numpy version has correct tracking (that's the second example) and the Python version has the incorrect Increment column. The above was run twice, both times the behaviours were the same.

Maybe in Python 3 the AST tracking code (this is a guess) needs to be updated due to changes in how Python runs?

When I ran the above with a smaller length (1,000,000 elements, 10* smaller) both the Python and numpy versions tracked similar +8MB allocations on their respective lines.

@ianozsvald First time I've used this package and I've noticed also an strange behaviour.

With version 0.55.0 I wouldn't get negative increments (as exposed in the pypi package on the examples) while with 0.54.0 the behaviour would match... maybe it's an introduced bug with 0.55.0?

Maybe one of the fixes for #195 is somewhat related with this?

Hi @ianozsvald ! Thanks a lot for reporting this.

My guess is that in the Python version, Python is silently executing the line multiple times as if it were a loop, which has always resulted in problematic reporting. Pull requests are highly appreciated :-)

Hey @fabianp . A colleague of mine @dxe4 is trying to diagnose this at a hackathon today, I'm going to post the original Python file here for him (and others if you want to try!).

Note that this runs for Python 3.7, it ought to work with Python 2.7 if you add:

from __future__ import division  # 1/2 == 0.5, as in Py3
from __future__ import print_function  # force use of print("hello")

$ more julia1_memoryprofiler.py
"""Julia set generator without optional PIL-based image drawing"""
import time

# area of complex space to investigate
x1, x2, y1, y2 = -1.8, 1.8, -1.8, 1.8
c_real, c_imag = -0.62772, -.42193

@profile
def calculate_z_serial_purepython(maxiter, zs, cs):
    """Calculate output list using Julia update rule"""
    output = [0] * len(zs)
    for i in range(len(zs)):
        n = 0
        z = zs[i]
        c = cs[i]
        while n < maxiter and abs(z) < 2:
            z = z * z + c
            n += 1
        output[i] = n
    return output

@profile
def calc_pure_python(draw_output, desired_width, max_iterations):
    """Create a list of complex co-ordinates (zs) and complex parameters (cs), build Julia set and display"""
    x_step = (x2 - x1) / desired_width
    y_step = (y1 - y2) / desired_width
    x = []
    y = []
    ycoord = y2
    while ycoord > y1:
        y.append(ycoord)
        ycoord += y_step
    xcoord = x1
    while xcoord < x2:
        x.append(xcoord)
        xcoord += x_step
    # set width and height to the generated pixel counts, rather than the
    # pre-rounding desired width and height
    # build a list of co-ordinates and the initial condition for each cell.
    # Note that our initial condition is a constant and could easily be removed,
    # we use it to simulate a real-world scenario with several inputs to our function
    zs = []
    cs = []
    for ycoord in y:
        for xcoord in x:
            zs.append(complex(xcoord, ycoord))
            cs.append(complex(c_real, c_imag))

    print("Length of x:", len(x))
    print("Total elements:", len(zs))
    start_time = time.time()
    output = calculate_z_serial_purepython(max_iterations, zs, cs)
    end_time = time.time()
    secs = end_time - start_time
    print(calculate_z_serial_purepython.__name__ + " took", secs, "seconds")

    assert sum(output) == 33219980  # this sum is expected for 1000^2 grid with 300 iterations

# Calculate the Julia set using a pure Python solution with
# reasonable defaults for a laptop
# set draw_output to True to use PIL to draw an image
#calc_pure_python(draw_output=False, desired_width=1000, max_iterations=300)
calc_pure_python(draw_output=False, desired_width=1000, max_iterations=300)

code:


@profile
def myfn_py():
    ones = [1] * NBR_ITEMS
    return ones

@profile
def myfn_np():
    ones = np.ones(NBR_ITEMS)
    twos = ones+1
    return twos

run 1:

myfn_np()
myfn_py()

output 1:

Line #    Mem usage    Increment   Line Contents
================================================
     7   54.656 MiB   54.656 MiB   @profile
     8                             def myfn_py():
     9   54.684 MiB    0.027 MiB       ones = [1] * NBR_ITEMS
    10   54.684 MiB    0.000 MiB       return ones

Filename: prof.py

Line #    Mem usage    Increment   Line Contents
================================================
    12   47.012 MiB   47.012 MiB   @profile
    13                             def myfn_np():
    14   50.836 MiB    3.824 MiB       ones = np.ones(NBR_ITEMS)
    15   54.656 MiB    3.820 MiB       twos = ones+1
    16   54.656 MiB    0.000 MiB       return twos

run 2:

myfn_np()
time.sleep(5)
myfn_py()

output 2:

Line #    Mem usage    Increment   Line Contents
================================================
     7   54.645 MiB   54.645 MiB   @profile
     8                             def myfn_py():
     9   58.488 MiB    3.844 MiB       ones = [1] * NBR_ITEMS
    10   58.488 MiB    0.000 MiB       return ones

Filename: prof.py

Line #    Mem usage    Increment   Line Contents
================================================
    12   46.996 MiB   46.996 MiB   @profile
    13                             def myfn_np():
    14   50.820 MiB    3.824 MiB       ones = np.ones(NBR_ITEMS)
    15   54.641 MiB    3.820 MiB       twos = ones+1
    16   54.641 MiB    0.000 MiB       return twos

run 3 (reversed order):

myfn_py()
myfn_np()

output 3:

Line #    Mem usage    Increment   Line Contents
================================================
     7   46.977 MiB   46.977 MiB   @profile
     8                             def myfn_py():
     9   50.828 MiB    3.852 MiB       ones = [1] * NBR_ITEMS
    10   50.828 MiB    0.000 MiB       return ones

Filename: prof.py

Line #    Mem usage    Increment   Line Contents
================================================
    12   50.828 MiB   50.828 MiB   @profile
    13                             def myfn_np():
    14   50.828 MiB    0.000 MiB       ones = np.ones(NBR_ITEMS)
    15   54.648 MiB    3.820 MiB       twos = ones+1
    16   54.648 MiB    0.000 MiB       return twos

print(sys.getsizeof(ones) / (1024 * 1024)) in myfn_py gives 3.81475830078125

so running myfn_np before myfn_py causes ones in myfn_py to report 0.027 opposed to 3.8 but a sleep in between prevents this. and vice versa calling myfn_py before myfn_np causesones in myfn_np to report 0

i may be wrong, i think what happens here is that even if some objects are garbage collected the OS may have not freed the memory yet, so measuring memory from an OS perspective may result to the above

@dxe4 you may want to force gc.collect() in your examples. Also Python's memory management has interesting allocation / free / reuse strategies which make predicting memory usage difficult. It involves blocks / pools / arenas. I think the arenas are what are allocated but the memory for arenas can only get freed if all the pools inside are unused. https://rushter.com/blog/python-memory-managment/

@ianozsvald in your second comment on this thread you mentioned the example in the README not correctly showing the effects of the del statement. I noticed this as well, running on two separate machines (Mac OS X and Linux). I tracked the introduction of the issue to between releases 0.54 and 0.55, and after looking at the commits in that period, I suspect it's due to PR #202 . This also matches with what @brupelo observed. However, when I try to profile some of my code under 0.54, I'm getting a different set of strange results.

It looks like #280 tried to address many of these factors. You may want to rebuild and try again to see if that solves your problem.

Thanks Brian for looking into this!

On Mon, Sep 7, 2020 at 9:37 PM Brian Zhang notifications@github.com wrote:

It looks like #280 https://github.com/pythonprofilers/memory_profiler/pull/280 tried to address many of these factors. You may want to rebuild and try again to see if that solves your problem.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pythonprofilers/memory_profiler/issues/236#issuecomment-688569803, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACDZB7KF4PIPUWPKHGTWPDSEWDGZANCNFSM4HZMC5BQ .

Just reporting that I just tried the README code on mac big sur 11.2 with pip installed memory_profiler 0.58.0 and didn't see the intended decrement too

Line #    Mem usage    Increment  Occurences   Line Contents
============================================================
     1   37.434 MiB   37.434 MiB           1   @profile
     2                                         def my_func():
     3   45.066 MiB    7.633 MiB           1       a = [1] * (10 ** 6)
     4  197.660 MiB  152.594 MiB           1       b = [2] * (2 * 10 ** 7)
     5  197.660 MiB    0.000 MiB           1       del b
     6  197.660 MiB    0.000 MiB           1       return a

pythonprofilers / memory_profiler

Mem Usage and Increment odd behaviour #236