Expose changes to Julia via `@snoopi_deep` to report timing profile breakdown of _all MethodInstances_ that are inferred during type inference

NHDaly commented 4 years ago

This builds on https://github.com/JuliaLang/julia/pull/37749 to expose the information through SnoopCompile.

This adds a macro to SnoopCompileCore @snoopi_deep which collects deep timings from Julia's type inference, and adds functions to display the resulting information as a profile.

TODO:

[ ] Add a docstring to InferenceFrameInfo in julia

This is a second attempt at https://github.com/timholy/SnoopCompile.jl/pull/138.

NHDaly commented 4 years ago

Early pass at SnoopI per-method-instance deep inference profile:

NHDaly commented 3 years ago

This PR is still in draft stage, but i'm trying to clean it up now.

[x] https://github.com/JuliaLang/julia/pull/37749 has been merged (thanks @timholy)!
[x] I've added tests
[x] I've added docstrings
[x] There is still some weirdness in trying to decide how best to print the names of the MethodInstances.
- I think the (almost obviously) best choice is to print them via the built-in printing for MethodInstances, so e.g. foo(::Integer), foo(::Int), etc.
- However, PProf currently strips the function arguments from the names of functions in the profile (🙄), so I've been using the TupleTypes directly instead.
  - But these are A) super ugly, and B) super long, and in e.g. ProfileView they go all the way across the screen so you can't read anything.
  - [x] I plan to file an issue and/or PR on pprof itself to (at least have on option to) not modify the function names for Julia functions. Also, the current function regexes are unable to handle parentheses around the function name (e.g. (::MyStruct)(::Int)), so we definitely need a PR to fix that.
[ ] We maybe want to support a configuration / flag that allows aggregating by method or function, rather than by inference frame (MethodInstance + constant folding), so that you can see when a method being specialized repeatedly is causing your bottleneck?
- Both for to_flamegraph() and flatten_times()

aminya commented 3 years ago

I would like to add deep snooping functionality to CompileBot.jl. I assume that all the configurations/inputs are the same as normal @snoopi, right?

NHDaly commented 3 years ago

I would like to add deep snooping functionality to CompileBot.jl. I assume that all the configurations/inputs are the same as normal @Snoopi, right?

I would say they're similar. The commands aspect should be the same, yeah: https://github.com/timholy/SnoopCompile.jl/blob/1b6314962e3bb8faced5c15703e1a6fc2e1d2cf6/SnoopCompileCore/src/snoopi.jl#L123

And the tmin param can be passed to SnoopCompile.flatten_times() and SnoopCompile.to_flamegraph() instead of directly to @snoopi_deep.

timholy commented 3 years ago

I'm supportive of your TODO list. I like the idea of fixing pprof and keeping the nicer printing. The last item (aggregation) sounds like it could also be added in a second PR, if you prefer (I'm fine with either).

NHDaly commented 3 years ago

I'm supportive of your TODO list. I like the idea of fixing pprof and keeping the nicer printing.

Okay! Good news, the naming weirdness is settled in pprof-land: I've asked about fixing this in google/pprof, and in the process I realized how we can work around it in PProf.jl: https://github.com/JuliaPerf/PProf.jl/pull/31

So I've updated this PR to use the MethodInstance name, and also filled out the StackFrame with more info. :)

The last item (aggregation) sounds like it could also be added in a second PR, if you prefer (I'm fine with either).

Yeah, i like the idea of doing this in a follow-up! Good idea.

NHDaly commented 3 years ago

Okay so I think this is looking good to go! :) I think this is ready to review.

I've updated to "Ready to Review". Thanks again for your help!

NHDaly commented 3 years ago

(and i've renamed the PR now that it's ready to go!)

aminya commented 3 years ago

You could add fail-fast: false to CI to allow the CI run on errors.

https://github.com/timholy/SnoopCompile.jl/blob/93cefafa225de447283d05a0e66f062a019015d3/.github/workflows/ci.yml#L14

timholy commented 3 years ago

Excited to see your progress, as you know I'm already benefiting from your work! I will try to review in the morning.

One thing I know I'm going to want, but not sure yet the best way to implement it, is to discover "who is calling the head of the next inference?" (head = root.children[i]). The reason I'm interested is that a break in the tree structure (starting a new head) indicates either toplevel or runtime dispatch. Sometimes there seem to be backedges but often not, so it's not entirely trivial short of inserting @show stacktrace(backtrace()) into the callee. Currently, the best strategy I've devised is to collect all the leaves of the previous children and then run findcallers on them. Any better ideas? I'm happy to implement this part myself, I don't mean to suggest you need to do it for this PR.

NHDaly commented 3 years ago

I'm already benefiting from your work!

🎉

"who is calling the head of the next inference?"

Yeah, interesting!

Starting a new top-level caller would happen (from what i understand) at runtime, after we've finished compiling and are now in the middle of executing, the code from (one of) the previous inference(s). So yeah, they're not connected at all.

And yeah, I expect there wouldn't be backedges set during dynamic dispatch, since you don't need to recompile the caller if the callee is updated (since it will still be a dynamic dispatch).

But hrmm, yeah, interesting. I guess the best way to implement something like this would indeed be to walk the stacktrace at the first call to typeinf (if the parent is ROOT), and walk up until you hit something that's not type inference i guess? And then record that as the caller? Interesting idea.

then run findcallers on them

Oooh this is cool! I hadn't seen this before. Neat! :)

KristofferC commented 3 years ago

Could also be something like @snoopi deep=true (similar to @code_warntype optimize=true), but maybe they are different enough that it doesn't make sense to have them behind the same macro.

NHDaly commented 3 years ago

Could also be something like @snoopi deep=true (similar to @code_warntype optimize=true), but maybe they are different enough that it doesn't make sense to have them behind the same macro.

@KristofferC - sorry i never responded to this. I had the same idea at first, but since this returns an entirely different data structure, and is used with entirely different helper functions, it doesn't seem to make sense to me to put them in the same macro.

NHDaly commented 3 years ago

Alright, tests are passing, and I'm finished applying changes! :)

Please let me know what you think about the to_flamegraph / flamegraph message, and then otherwise I think this is good to go! 🎸

timholy commented 3 years ago

This looks basically ready to go. The one thing I'd still recommend (but not insist on, if you disagree) is making the "public" methods flamegraph rather than to_flamegraph. Here are the reasons:

there are two ways to think of a flamegraph: (1) a tool for visualizing runtime costs (this does not fit that) (2) a data structure in which parent-child relationships are represented in two dimensions, where the vertical axis represents parentage and the horizontal axis is used to represent a timing dimension (this does fit that).
I think of flamegraph almost as a constructor (there is no FlameGraph type in FlameGraphs, because it's just the root node of a tree), and we have constructors that take very diverse inputs and return the same type of object. For example, Array{Int}(undef, 3) and Array{Int}(1:3) are very different beasts indeed, yet we call them the same thing because they produce the same output type.

If you do agree that it can safely be another method of flamegraph, perhaps we should even consider having SnoopCompile re-export than name. Alternatively, we could just expect users to say using FlameGraphs.

timholy commented 3 years ago

Also, PRs like this are a testament to both your own excellent work and Julia's amazing composability: if you deleted all the documentation (no, please don't!) this is a remarkably small PR given the huge gain in functionality. Pretty incredible.

NHDaly commented 3 years ago

Thanks @timholy for the kind words! ❤️

Agreed regarding how amazing it is that we could add such interesting functionality in such small PRs. 😊

Per your suggestion, I've changed to implement flamegraph(), and re-exported it. Thanks!!

I believe this is ready for final review!

NHDaly commented 3 years ago

😁 Thanks! :) I don't have merge permissions, so...... Merge at will! 😁

timholy commented 3 years ago

I don't have merge permissions

You should have gotten an email about this earlier today...let me know if you can't find it and I'll try to resend.

NHDaly commented 3 years ago

Hehe oh, you're right! 😁 Thanks! :) Amazing. Really appreciate it! ✔️ 🥳

Do you have any preference regarding Merge vs Squash? I'm inclined to Squash because there were a lot of commits here.

NHDaly commented 3 years ago

Oh, also, I guess I should do a version bump here.

Is this a minor or patch change? (Or major?) I'm inclined to say minor? And do you think it's okay to release both @snoopl (#135) and @snoopi_deep (this PR) in the same new version?

timholy commented 3 years ago

Yes to squash, yes to minor bump, yes to bringing both new pieces of functionality out in a single release. Before registering, we might want to work on the Documenter docs a bit to advertise this functionality.

timholy commented 3 years ago

I was going to let you do the honors, @NHDaly, but I'm eager to use this. Thanks again!

NHDaly commented 3 years ago

Hehe, eep! I'm sorry -- I was excited to do the honors, too, but I felt like it probably needs more documentation, and then I got carried away doing some analyses now that we have these tools instead of writing the docs! 🙈

Thanks for the merge, @timholy! :) I'll try to follow up with docs soon! <3

timholy commented 3 years ago

I may be able to help too. Lots going on so it will be a bit back-burner but it's already helping, so let's at least work with it in master.

timholy / SnoopCompile.jl

Expose changes to Julia via `@snoopi_deep` to report timing profile breakdown of _all MethodInstances_ that are inferred during type inference #139