llogiq / flame

An intrusive flamegraph profiling tool for rust.
Apache License 2.0
694 stars 30 forks source link

Combine identical stack traces #33

Open mfr-itr opened 6 years ago

mfr-itr commented 6 years ago

I use flame for a ray tracer, which roughly correspond to

for i in 0..x {
  for j in 0..y {
    let _guard = flame::start_guard("compute color");
    out[i][j] = compute_color(i, j);
  }
}

The trouble is that I get x*y "compute color" frames, each taking 0.001% of running time. Is there a way to combine these in a single frame, aside from doing everything manually with flame::frames()?

mfr-itr commented 6 years ago

A quick way to do that would be to generate an output compatible with https://github.com/brendangregg/FlameGraph#2-fold-stacks, and flamegraph.pl could be used to generate the graph. A native way to do that would be nice, though.

mfr-itr commented 6 years ago

This works for me:

   fn print_trace(s: flame::Span, prefix: &str) {
        let my_prefix = format!("{};{}", prefix, s.name);
        println!("{} {}", my_prefix, s.delta);
        s.children.into_iter().for_each(|c| print_trace(c, &my_prefix));
    }

    for span in flame::threads().into_iter().flat_map(|t| t.spans.into_iter()) {
        print_trace(span, "all");
    }
TyOverby commented 6 years ago

I definitely plan on making a crate for these kind of utilities.

Also, I don't understand how your example code does what you want? How is it combining similar stacks?

mfr-itr commented 6 years ago

It produces the same output as stackcollapse-*, so I run cargo run | flamegraph > out.svg. flamegraph combines the frames if they have the same path. It takes a long time to run though, so combining the frames before sending them to flamegraph would be worth it. Maybe by recursively merging Span::children if Span::name are identical?

ebarnard commented 6 years ago

@mfr-itr This is what I have been using for the same purpose (except using flame to generate html).

use std::fs::File;
use std::mem;

fn write_flame() {
    let mut spans = flame::threads().into_iter().next().unwrap().spans;
    merge_spans(&mut spans);
    flame::dump_html_custom(&mut File::create("flame-graph.html").unwrap(), &spans)
        .unwrap();
}

fn merge_spans(spans: &mut Vec<flame::Span>) {
    if spans.is_empty() {
        return;
    }

    // Sort so spans to be merged are adjacent and spans with the most children are
    // merged into to minimise allocations.
    spans.sort_unstable_by(|s1, s2| {
        let a = (&s1.name, s1.depth, usize::max_value() - s1.children.len());
        let b = (&s2.name, s2.depth, usize::max_value() - s2.children.len());
        a.cmp(&b)
    });

    // Copy children and sum delta from spans to be merged
    let mut merge_targets = vec![0];
    {
        let mut spans_iter = spans.iter_mut().enumerate();
        let (_, mut current) = spans_iter.next().unwrap();
        for (i, span) in spans_iter {
            if current.name == span.name && current.depth == span.depth {
                current.delta += span.delta;
                let mut children = mem::replace(&mut span.children, Vec::new());
                current.children.extend(children.into_iter());
            } else {
                current = span;
                merge_targets.push(i);
            }
        }
    }

    // Move merged spans to the front of the spans vector
    for (target_i, &current_i) in merge_targets.iter().enumerate() {
        spans.swap(target_i, current_i);
    }

    // Remove duplicate spans
    spans.truncate(merge_targets.len());

    // Merge children of the newly collapsed spans
    for span in spans {
        merge_spans(&mut span.children);
    }
}
dlaehnemann commented 6 years ago

@ebarnard, thanks for working this out and for posting this here -- I was looking for exactly this!

One note on getting this to work for me: In the latest flame releaser (0.2.2), dump_html_custom() is not yet made public in src/lib.rs (i.e. the last line reads pub use html::dump_html;). So I used the latest GitHub version on the master branch, which does make it public:

https://github.com/TyOverby/flame/blob/3d07e2115d6ac5018fe7010afb288144b407e13b/src/lib.rs#L557

I.e. my Cargo.toml now lists the flame dependency as:

flame = { git = "https://github.com/TyOverby/flame.git", branch = "master", optional = true }
tomtung commented 5 years ago

This is essential for profiling probably any non-trivial program!

Thanks to @ebarnard for the script and @dlaehnemann for the fix; they work like a charm :) Can't wait for this to be implemented on the main branch.

dlaehnemann commented 5 years ago

Hi @tomtung,

the flame crate approach is very useful when flamegraphing specific parts of code. But having to explicitly mark functions for tracking can be prohibitive. Especially when working with crate dependencies that would require local checkouts of those crates, local modifications and than pointing you Cargo.toml to those local checkouts for the flamegraphing to include functions from those crates. Thus, I have since established a different approach for my rust code, using more general-purpose tools that allow to flamegraph everything: perf + flamegraph.pl

I wrote up a small gist about it, maybe it's useful: https://gist.github.com/dlaehnemann/df31787c41bd50c0fe223df07cf6eb89

cheers, david

tomtung commented 5 years ago

@dlaehnemann Thanks for the tips! I happened to have already starred that gist just a few hours ago :) Btw, for my case rust-unmangle didn't seem to be necessary when c++filt was present, and I also needed stackcollapse-recursive.pl to unclutter recursive calls.

I was using this crate (flame) to profile some algorithm implementation. With the stack merging script, I still found this crate very handy, since it gives me the freedom to logically annotate sections of my code that aren't necessarily individual frames on the stack, which helps me get the overall big picture more easily.

dlaehnemann commented 5 years ago

Jep, definitely. There are use cases for both. I did expect rust-unmangle to become unnecesaary at some point, as Rust name mangling evolves, just still saw some function calls that needed it. So thanks for pointing out that you've seen a case with no effect.

And the stackcollapse-recursive.pl sounds like a great addition.

Happy optimising! ;)