mgbckr commented 3 years ago

Hi, I had an issue with memory_profiler where profiling a multi-processing scenario based on joblib resulted in gross overestimation of memory usage. I am not 100% sure why this is happening but I assume it has something to do with "copy on write" or rather that joblib seems to use shared memory for "read-only" data and RSS may not factor that into its estimation. Let me know if you have further insights on this.

Here is some code reproducing this issue:

from memory_profiler import memory_usage
import numpy as np
import joblib
import time
import datetime

def test_multiprocessing_showcase():

    def func():

        n_jobs = 8
        size = 3000

        print(f"Creating data: {size}x{size} ... ", end="")
        a = np.random.random((size, size))
        print(f"done ({a.size * a.itemsize / 1024**3:.02f} Gb). ", end="")

        def subprocess(i):
            aa = a.copy()
            r = aa[1,1]
            aa = a.copy()
            time.sleep(10)
            return r

            # r = a[1,1]
            # # time.sleep(10)
            # return r

            pass

        start = datetime.datetime.now()
        print(f"Starting processing: n_jobs={n_jobs} ... ", end="")
        results = joblib.Parallel(n_jobs=n_jobs)(
            joblib.delayed(subprocess)(i) 
            for i in range(n_jobs))
        print(f"done ({datetime.datetime.now() - start}). ", end="")

        return results

    rss = memory_usage(proc=func, max_usage=True, backend="psutil", include_children=True, multiprocess=True)
    print(f"RSS: {rss:.02f}")
    uss = memory_usage(proc=func, max_usage=True, backend="psutil_uss", include_children=True, multiprocess=True)
    print(f"USS: {uss:.02f}")
    pss = memory_usage(proc=func, max_usage=True, backend="psutil_pss", include_children=True, multiprocess=True)
    print(f"PSS: {pss:.02f}")
    print(f"RSS: {rss:.02f}, USS: {uss:.02f}, PSS: {pss:.02f}")

if __name__ == "__main__":
    test_multiprocessing_showcase()

        # n_jobs = 32
        # size = 25000
        # Creating data: 25000x25000 ... done (4.66 Gb). Starting processing: n_jobs=32 ... done (0:00:37.581291). RSS: 353024.01
        # Creating data: 25000x25000 ... done (4.66 Gb). Starting processing: n_jobs=32 ... done (0:00:38.867385). USS: 148608.62
        # Creating data: 25000x25000 ... done (4.66 Gb). Starting processing: n_jobs=32 ... done (0:00:29.049754). PSS: 169253.91

        # n_jobs = 64
        # size = 10000
        # Creating data: 10000x10000 ... done (0.75 Gb). Starting processing: n_jobs=64 ... done (0:00:14.701243). RSS: 111362.79
        # Creating data: 10000x10000 ... done (0.75 Gb). Starting processing: n_jobs=64 ... done (0:00:15.020202). USS: 56108.69
        # Creating data: 10000x10000 ... done (0.75 Gb). Starting processing: n_jobs=64 ... done (0:00:15.072918). PSS: 54826.61

        # Conclusion:
        # * RSS is overestimating like crazy (I monitored the actual memory usage using htop)

python=3.7.8
memory_profiler=0.57.0
joblib=1.0.1

mgbckr commented 3 years ago

311

mgbckr commented 3 years ago

I guess #311 can be considered to resolve this issue somewhat. However, I am not 100% sure I grasped the whole picture so some input from someone with a bit more background knowledge would be helpful. If there is nothing to add, we can close this issue, I guess.

pythonprofilers / memory_profiler

Gross memory overestimation when profiling multiprocessing based on `joblib` #312

311