Profiling with multiprocessing

sYnfo commented 6 years ago

Is there any support for profiling multiple processes? Right now profiling code that uses multiprocessing gives me eg.

40.909 update_basketball_events_db  db_updater/events.py:38
└─ 40.855 map  multiprocessing/pool.py:261
   └─ 40.855 get  multiprocessing/pool.py:637
      └─ 40.855 wait  multiprocessing/pool.py:634
         └─ 40.855 wait  threading.py:533
            └─ 40.855 wait  threading.py:263

ihnorton commented 6 years ago

FWIW, joblib allows switching between multiprocessing and threading with a keyword argument. I changed a single call to joblib.Parallel(backend='threading', ... to get a usable pyinstrument trace into some code that normally uses multiprocessing (of course I ran pyinstrument on a smaller problem, but that was more than sufficient for profiling purposes).

diegobill commented 5 years ago

@ihnorton , could you give an example how to use joblib to profile a multi thread python code with pyinstrument?

ihnorton commented 5 years ago

I don't have a specific example handy, but the idea is that using backend=threading prevents joblib from spawning a new subprocess. Such a process would not be visible to pyinstrument. Instead, with backend=threading, all execution is kept in the main process, so pyinstrument's sampling can still happen (the GIL still applies to python code run with the threading module, so pyinstrument still has observability there; if I remember right, pyinstrument doesn't look at the C stack, so if you are running C code that releases the GIL you have to use something else anyway).

chiragjn commented 4 years ago

I have the same problem, using joblib also does not give any visibility into time spent in individual functions in separate threads. @joerick, is this something not supported?

The following section from README led me to believe it would work

Pyinstrument uses a new profiling mode. Rather than using signals, pyinstrument uses a new statistical profiler built on PyEval_SetProfile. This means no more main thread restriction, no more IO errors when using Pyinstrument, and no need for a separate more 'setprofile' mode!

Now, I think it means that threads would not crash on 2.x and above

Any way to make this work ?

                  │        ├─ 0.401 retrieve  joblib/parallel.py:893
                  │        │  └─ 0.401 get  multiprocessing/pool.py:637
                  │        │     └─ 0.401 wait  multiprocessing/pool.py:634
                  │        │        └─ 0.401 wait  threading.py:533
                  │        │           └─ 0.401 wait  threading.py:263
                  │        │              └─ 0.400 [self]  
                  │        └─ 0.005 _terminate_backend  joblib/parallel.py:731
                  │           └─ 0.005 terminate  joblib/_parallel_backends.py:232
                  │              └─ 0.005 terminate  multiprocessing/pool.py:537
                  │                 └─ 0.005 __call__  multiprocessing/util.py:167
                  │                    └─ 0.005 _terminate_pool  multiprocessing/pool.py:561
                  │                       └─ 0.005 join  threading.py:1024
                  │                          └─ 0.005 _wait_for_tstate_lock  threading.py:1062

madig commented 3 years ago

FWIW, https://pypi.org/project/py-spy/ can help if you really need multiprocessing and need to look into subprocesses right now.

danvk commented 3 years ago

You can use the ProcessPoolExecutor example from the Python docs as a simple repro for this issue (I've made the prime numbers bigger to make is_prime show up more clearly in the profile):

import concurrent.futures
import math

PRIMES = [
    1_000_000_000_100_011,
    1_000_112_354_597_667,
    1_003_229_774_283_941,
    1_011_001_110_001_111,
    1_084_051_191_974_761
]

def is_prime(n):
    if n < 2:
        return False
    if n == 2:
        return True
    if n % 2 == 0:
        return False

    sqrt_n = int(math.floor(math.sqrt(n)))
    for i in range(3, sqrt_n + 1, 2):
        if n % i == 0:
            return False
    return True

def main():
    with concurrent.futures.ProcessPoolExecutor(max_workers=2) as executor:
        for number, prime in zip(PRIMES, executor.map(is_prime, PRIMES)):
            print('%d is prime: %s' % (number, prime))

if __name__ == '__main__':
    main()

Running pyinstrument doesn't show any time spent in is_prime:

$ pyinstrument processpool_example.py
1000000000100011 is prime: True
1000112354597667 is prime: False
1003229774283941 is prime: True
1011001110001111 is prime: True
1084051191974761 is prime: True

  _     ._   __/__   _ _  _  _ _/_   Recorded: 18:16:42  Samples:  29
 /_//_/// /_\ / //_// / //_'/ //     Duration: 3.418     CPU time: 0.030
/   _/                      v3.4.2

Program: processpool_example.py

3.417 <module>  <string>:1
   [4 frames hidden]  <string>, runpy
      3.417 _run_code  runpy.py:64
      └─ 3.417 <module>  processpool_example.py:3
         └─ 3.412 main  processpool_example.py:28
            └─ 3.382 _chain_from_iterable_of_lists  concurrent/futures/process.py:478
                  [6 frames hidden]  concurrent, threading, <built-in>
                     3.382 lock.acquire  <built-in>:0

As @madig mentioned, running the same code through py-spy does surface the time spent in is_prime:

$ py-spy record --subprocesses -o profile.svg -- python processpool_example.py

joerick / pyinstrument

Profiling with multiprocessing #31