plasma-umass / scalene

Scalene: a high-performance, high-precision CPU, GPU, and memory profiler for Python with AI-powered optimization proposals
Apache License 2.0
12.22k stars 399 forks source link

Scalene does not seem to work with multiprocessing #809

Closed makmanalp closed 4 months ago

makmanalp commented 7 months ago

Describe the bug

Scalene does not seem to collect profiling data from subprocesses forked by multiprocessing.

To Reproduce

Run this:

import multiprocessing as mp

def fun(args):
    return args

if __name__ == "__main__":
    p = mp.Pool(mp.cpu_count())
    r = p.map(fun, range(1000000))
    print(sum(x for x in r))
    print(((1000000-1) * 1000000) / 2)

in scalene via python3 -m scalene scalenetest.py

Expected behavior

Expect to see line profiler show stats from inside the function fun(), gathered from the subprocesses. What happens instead is that I get a profile that shows that the bulk of the time is spent in the main process waiting for the subprocesses to complete.

Screenshots

image

Desktop (please complete the following information):

Additional context Add any other context about the problem here.

ogencoglu commented 4 months ago

Same issue here

emeryberger commented 4 months ago

Thanks for the note. I dug in and there are basically two problems with this particular example:

  1. The function is empty, so it spends basically no time in the function. It's not a huge surprise that the sampling doesn't see execution in the function.
  2. The pool is not properly being closed, so its statistics don't get properly propagated to the main Scalene profiler process. When using it correctly (e.g., in a context manager), the results look fine.

I've modified your code to spend some compute time in fun and also to use a context manager for Pool so it cleans up after itself.

import multiprocessing as mp

def fun(args):
    x = 0
    for i in range(100):
        x += 1
    return args

if __name__ == "__main__":
    with mp.Pool(mp.cpu_count()) as p:
        r = p.map(fun, range(1000000))
    print(sum(x for x in r))
    print(((1000000-1) * 1000000) / 2)

This is the result using the repository version (on a Mac running Sonoma, though I don't think it should matter). As you can see, most of the time is now correctly attributed to code in fun.

Screenshot 2024-07-06 at 5 51 24 PM