timholy / FlameGraphs.jl

Analysis of profiling data using trees
MIT License
51 stars 12 forks source link

Differential flamegraphs #7

Open timholy opened 4 years ago

timholy commented 4 years ago

Some other tools allow one to compare performance before & after a code change: http://www.brendangregg.com/blog/2014-11-09/differential-flame-graphs.html. This might be a worthy addition. The "logic" should go in this package, with visualizations scattered to other packages.

Perhaps not so simple as to be deserving of a "good first issue" label, but presumably this would not be terribly complicated.

Arkoniak commented 4 years ago

Hello, I am in need of advice. I get to this issue from https://github.com/davidanthoff/ProfileVega.jl/issues/5 and really liked the idea. Unfortunately I was struggling with this issue with no luck at all.

My setup was the following

function f1(n)
    res = 0
    for _ in 1:n
        res += sum(rand(10_000))
    end

    for _ in 1:n
        res -= sum(rand(10_000))
    end
    res
end

function f2(n)
    res = 0
    for _ in 1:n
        res += sum(rand(20_000))
    end

    for _ in 1:n
        res -= sum(rand(5_000))
    end
    res
end

f1(1)
f2(1)
Profile.clear()
@profile f1(10000)
baseline = Profile.fetch()

Profile.clear()
@profile f2(10000)
target = Profile.fetch()

Initially I thought, that I can just build dictionary from baseline of the kind sf => length(span), but this idea turns out to be bad, because there are multiple entries with the same sf.

Secondly, I've tried to match nodes between target and baseline on the same level using largest common sequence approach. That was almost good, but then I run into the problem, that different runs may generate different root nodes, for example f1(::Int64) at REPL[20]:4 and f1(::Int64) at REPL[21]:4, so this algorithm breaks at the start.

I was thinking about anonymizing this kind of nodes "f1(::Int64) at REPL[20]:4" => ":4", but it has drawbacks

  1. Even slightest changes to the function (like inserting print in the beginning) will destroy comparison base.
  2. Other tools generate different versions of this nodes, for example jupyter do something like f1(::Int64) at In[20]:4, so all versions should be anonymized and it looks complicated.

In order to solve first problem, I had an idea of look forward, i.e. if REPL node is met, than it should be skipped for one turn, all childs of all REPL nodes of the next level should be concatenated together and LCS procedure should be applied. After all nodes is matched, procedure will go one step backward and match REPL nodes between base and target according to the match of their children.

Problem is, it looks like overly complicated solution and I am afraid that I am missing something simple. Thank you in advance!