llogiq / flame

An intrusive flamegraph profiling tool for rust.
Apache License 2.0
700 stars 30 forks source link

spans inside frames inside spans #6

Open daniel-vainsencher opened 8 years ago

daniel-vainsencher commented 8 years ago

I am not writing a game, but numerical code, such that algorithms have loops with some interesting parts inside the loop and some outside.

So I want a span before the loop starts, a span on the whole loop, inside the loop each iteration is a frame, and more spans inside the loop, partitioning each frame.

Currently this results in errors like thread 'logisticregression3' panicked at 'flame::end("SublinearAveragingSolver") called without a currently running span!', /home/danielv/.cargo/registry/src/github.com-88ac128001ac3a9a/flame-0.1.5/src/lib.rs:257 (presumably because my next_frame inside the loop is hiding the loop scope flame::start("Sublinear...");

Does this make sense?

TyOverby commented 8 years ago

Yeah... Sadly frames are intended only to be used at the top level of performance collecting.

What is the pattern that you are trying to look at in your code, and what do you want frames to help you do?

daniel-vainsencher commented 8 years ago

My optimization algorithms are typically some initialization and then a loop. The body of the loop has several actions. My main goal is to understand the cost of the in-loop actions, but I want to know (not guess) that the initialization isn't too expensive either.

Having removed the outer span, so that I have a next_frame at the beginning of each loop iteration and a bunch of span_of in it, I find the resulting flame graphs surprising: they have the spans sorted by name, but I expected the times for a particular named span to be summed over the frames, instead I have 10 copies (for ten loop iterations) of each. Is this the expected behavior?

daniel-vainsencher commented 8 years ago

On second thought, what I should probably do is give up on wrapping the whole loop, but instead wrap the initialization code by its own span. Ok, that's reasonable.

TyOverby commented 8 years ago

The by-name sorting thing surprised me too. I've pushed 0.1.6 which fixes this.

Also, right now, none of the exporters support drawing frames. I'm writing my own viewer though, so when that's done, it'll support frames and viewing multi-threaded computation!

daniel-vainsencher commented 8 years ago

It seems each user has different expectations about what frames mean... to me it was obvious we sum over frames: since frames are likely to be similar, I want to average over them to more precisely measure the different spans in it. Looking at hprof, they only show the spans in the current frame.

It is even not obvious what sort order is best: by size and by order of occurence both make sense.

daniel-vainsencher commented 8 years ago

How do you plan to treat spans over a sequence of frames? summing as I'd prefer, or you have something else in mind?

TyOverby commented 8 years ago

For the flamegraph, I've opted to order by occurrence.

Honestly, the only reason I included the concept of a frame was to make visualizing perf for a game possible from inside the game itself. This is why I haven't built frames support into the dump visualizations yet.

Frames will probably always be the outermost unit of measurement in FLAME and they'll happen on a per-thread basis. What would it mean to have nested frames? Or a frame that doesn't

If you want to collapse multiple frames into each other, you can use flame::end_collapse(..) or my_guard.end_collapse(). This method only really works for leaf-nodes that share the same name but it can be handy though.

daniel-vainsencher commented 8 years ago

Of course you will keep the design fit for your own purposes, but I will explain my point of view, maybe you'll find it useful.

Frames are a special case of loop bodies. Loops are sometimes nested. How should a profile of a program with (maybe nested) loops look like? (the visualization inside the game scenario is kind of orthogonal, will get back to it).

If you loop happens some small number of times, and each time is completely different because of differing input etc, you really want to ignore the loop entirely, and display the profiles of the loop bodies one after the other for comparison.

But it is pretty common that loops have many iterations (so the previous display proposed is impractical), and that different iterations are pretty similar to one another, and what we really care about is the relative costs of different loop-body-parts. In this case, you want to treat body-loop-passes (will call them frames for convenience now) as just samples from a process that you are trying to study. What can we do then? the most natural thing to do is average over them: construct a single profile, that has the union of all spans that occured over different frames, and divide the total measured time for each by the number of frames. But you can other statistics as well: show how many frames we are averaging over, show the +/- 25% percentiles in addition to the mean, pinpoint outliers etc.

What about nested loops? typically, you'll want to average over each of them separately. The inner loop will be its own part in the external one.

Of course, this applies to recurrences of different types, like iterators, not just loops.

In fact, a very natural way to implement all of this is to always "summarize" like this the contents of any span. When its children occur only once, you don't see it. When a span has children repeating many times, the summary is valuable.

To visualize inside the game: you can treat a particular span as the frame, and then for that span, either use only the last one as in hprof, or average over the last 30 for somewhat stabler display, or whatever is useful. But any in-frame loops, you still probably want the kind of summary I suggested above.

Sorry for the wall of text.

TyOverby commented 8 years ago

Sorry for the wall of text.

Thanks for the wall of text; I understand your use case much more now.

I think that there are two (equally important) parts to FLAME: the API, and the viewer. With the alpha release, I'm proud of how small the API is, but the visualizer was thrown together at the last moment. I think in code with high amounts of repetition, there is certainly a need for the viewer to be intelligent; able to detect repetition, and have options for collapsing and summarizing.

Detection of the pattern produced by code like this

for _ in 0 .. 30 {
    ::flame::start("foo");
    do_something();
    ::flame::end("foo");
}

would be trivial, and could be very user-friendly without any API changes to account for it.

I'm going to start writing my own visualizer this weekend (the current one is 3rd party), and when I do, I'll keep loop-detection in mind.If the experiment works out well, I also might deprecate the "frame" API as it would be unnecessary.

daniel-vainsencher commented 8 years ago

Cool, glad to clarify my use case, and glad to hear it is (IIUC) within scope :)

On Thu, May 26, 2016 at 12:00 PM, Ty Overby notifications@github.com wrote:

Sorry for the wall of text.

Thanks for the wall of text; I understand your use case much more now!

I think that there are two (equally important) parts to FLAME: the API, and the viewer. With the alpha release, I'm proud of how small the API is, but the visualizer was thrown together at the last moment. I think in code with high amounts of repetition, there is certainly a need for the viewer to be intelligent; able to detect repetition, and have options for collapsing and summarizing.

Detection of the pattern produced by code like this

for _ in 0 .. 30 { ::flame::start("foo"); do_something(); ::flame::end("foo"); }

would be trivial, and could be very user-friendly without any API changes to account for it.

I'm going to start writing my own visualizer this weekend (the current one is 3rd party), and when I do, I'll keep loop-detection in mind. If the experiment works out well, I also might deprecate the "frame" API as it would be unnecessary.

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/TyOverby/flame/issues/6#issuecomment-221915210