Open jlfwong opened 6 years ago
Would be useful for https://github.com/nico/ninjatracing.
This would be a very helpful feature to understand multiprocess / mutlithreaded programs, especially if tracks can be nested: a set of tracks for each subprocess and nested subtracks for each thread in each subprocess.
It's possible to use viztracer to profile multiprocess/multithreaded Python code with the perfetto trace viewer as shown below:
https://github.com/gaogaotiantian/viztracer#multi-process-support
Unfortunately viztracer does not support native code profiling (and is a tracing profiler instead of sampling profiler). While py-spy supports native profiling, its speedscope export does not allow the multitrack visualization that the viztracer/perfetto combo allows.
If nesting is too complex to implement, just one track for each thread would be enough, assuming it's possible to prefix the name of the threads with the pid of the process and therefore threads from the same process would naturally be displayed next to one another assuming a lexicographical ordering of the thread tracks by name.
@ogrisel please notice that py-spy is a statistical profiler so it's impossible by nature to have a "synchronized timeline" between threads. Many people are kind of confused(and it's completely normal) between the trace view and the flamegraph. VizTracer is a tracer which logs all the function entry/exit so it can display the events in chronological view, while having a larger overhead than a sampling profiler(if you sample slow enough). py-spy, being an amazing Python profiler, just simply can't provide this kind of information to make it useful to display in multi-tracks.
@gaogaotiantian I don't know anything about Python but I do know about profiling in other languages. In general I prefer sampling profilers because they don't spoil the timings and anything that doesn't show up in any sample is usually quick enough to be negligible. Tracing profilers usually put false emphasis on areas with a lot of small function calls. Why would a sampling profile be unable to provide a synchronized timeline? That's a Python thing? For example in Go, when you acquire a sample, you would stop the world (it's literally called stopTheWorld
) and get stack traces from all threads. Those stack traces show the call stack in all of those threads at the exact same time.
Even taking the sampling limitations into account, I think that a time ordered multitrack view would still have significant value, at least for function calls that last ~10x the sampling resolution or more.
@AndreKR Every advantage you said about sampling profiler is true. And yes, you can get the full stack of every thread at the same time, you can even put them on a timeline based on the time you sample. but that's not useful in general.
Let's clear things up first - how fast are you sampling? This is a critical question to explain the following stuff. When we say "sampling profiler has a much lower overhead/skew than tracer", that means you are sampling much slower than the FEE(function entry/exit) frequency. And it means the data you collect is valuable and meaningful if and only if you put them together and average it out. A single sample's meaning is very limited.
@ogrisel is correct about the resolution. If the function period is larger than 10x sampling interval, then it gives you some idea about timing(and almost timing only, which is the most and maybe the only important thing in profiling).
Another related topic for display multi-track profiling for multi-thread(process) - not all sampling profiler keeps the call stack for each sample, as the memory/disk usage would be growing fast if you sample for a long time. Instead, they keep the total sample for the call stack tree to estimate the timing(flamegraph). I don't understand py-spy's mechanics enough to know what it does, but I'm sure this could be a concern.
I usually find that you don't need that high of a sampling frequency. Like when I profile an HTTP request for a website and I see that it spent several samples rendering a template, then I know I have to cache that template. I don't really care if substring()
or replace()
was called 300 times during that time, I'm not gonna be able to optimize that anyway. What I do care about is whether it was the second or the third template that eats the lion's share, which is why I use a flamechart and not some aggregation by function name.
Yes. Sampling profiler can provide timing information about the function that's significantly larger than sampling period. In your example, if you can "cache" the template, then sampling profiler is definitely the way to go. To be more generic, if you want to find the functions that are longer than a certain period of time(let's say 200ms), then sampling at 20 ms or even 40 ms will provide the exact information you need.
However, sometimes you want more information about why the function is slow. Is it because it executed 300 replace()
or just one replace()
took too long? That's something a sampling profiler may not able to tell you(the samples will probably be the same in both cases).
Also, in this situation, it also requires the gap between the functions is larger than the sampling period, otherwise you won't be able to distinguish between the templates(sometimes you can, based on the code).
I'm not saying sampling profiler is bad. I actually think it's great in many many cases. I personally use sampling profiler a lot for my development. However, it's not absolutely better than a tracer in every situation. Like I said, it is super useful to locate the most time consuming function, which is probably the most significant mission of a profiler, but sometimes we just need more to that.
We have gone completely off topic by the way. :) This issue is about visualizing multiple parallel timelines.
That's right. Let's steer back to the topic(of which other people might have better ideas).
While #82 refers to the ability to import profiles with multiple tracks at all, this issue refers to the ability to view multiple profiles on the screen at the same time w/ a synchronized timeline in the chronological view. This will be useful for e.g. viewing how events correlate between threads or processes.