How do you profile applications that use multiprocessing?

vmprof / vmprof-python

vmprof - a statistical program profiler

https://vmprof.readthedocs.io

Other

433 stars 55 forks source link

How do you profile applications that use multiprocessing? #179

Open fake-name opened 6 years ago

fake-name commented 6 years ago

I have a very, very parallelized project, that uses python multiprocessing extensively.

Is there a canonical way to use vmprof for profiling applications that have more then one process? Right now, you only get the main thread profile data.

I wonder if there's a way to monkey patch multiprocessing to install the profiler in all created processes.

Also, is there a way to join multiple profile output files? It'd be ideal if you could have one file writer process that aggregated all the sampled process at runtime, but I can also see having each thread dump it's own profile log file, that then gets aggregated once your application has completed.

fake-name commented 6 years ago

Oh, additionally, I think that if you attach vmprof to a process, fork using multiprocessing, and then try to attach vmprof to the newly forked process (which is now outside the profiler, I believe), the child thread vmprof.enable() call fails with _vmprof.VMProfError: vmprof is already enabled.

This is easy enough to work around (fork, then attach profiler, rather then attach, and then fork), but it might be a decent idea to have the multiple-enable checking code to see if the value of multiprocessing.current_process() has changed.

(I'm guessing a bit here, since I don't have a stand-alone test case to validate my assumptions, and how vmprof actually behaves under multiprocessing is undocumented.)

PierreRustOrange commented 5 years ago

I'd be interested if you have any further code on this, I'm facing the same issue when using multi-processing. From my tests, any new processes created by multiprocess are indeed not monitored.

fake-name commented 5 years ago

I wound up finding https://github.com/benfred/py-spy to be pretty handy, albeit it only works for cpython (I use mostly pypy3 these days). You can attach it to any running python process, so I'd just attach it to the python process using the most CPU, and have a look-see.

It's not ideal, but it'd enough to get a rough idea where stuff is.

Writing a extension that hooks atfork() or something is yet another thing I'd like to do if I ever can find the time.

Actually, https://github.com/uber/pyflame claims it supports multithreading, so that's another avenue to investigate.

PierreRust commented 5 years ago

Thanks for the feedback, I've also been using py-spy, with the same issue concerning pypy ;-)

I've tried pyflame and although is supports multi-thread (like vmprof and py-spy) it does not help with multi-process : forked are completely invisible to it.

mushan09 commented 5 years ago

@PierreRust So how do you solve this problem later? I have no idea about this.