timholy / SnoopCompile.jl

Provide insights about latency (TTFX) for Julia packages
https://timholy.github.io/SnoopCompile.jl/dev/
Other
309 stars 48 forks source link

Too many coffee breaks while building the flamegraph #150

Closed timholy closed 3 years ago

timholy commented 3 years ago

I am loving @snoopi_deep but building the flamegraph can take minutes. All the time appears to be in this call: https://github.com/timholy/SnoopCompile.jl/blob/bf52c6bc3494905039f90ad6ea8f9fdf1c24202f/src/parcel_snoopi_deep.jl#L114

Can this be sped up? Or do we have to consider developing a lazy-string that will be shown only when the user hovers over it in ProfileView?

NHDaly commented 3 years ago

Yeah, I've noticed this too. Apparently this is super duper slow.

Or do we have to consider developing a lazy-string that will be shown only when the user hovers over it in ProfileView?

I'd rather speed it up, because with PProf the whole thing is exported to disk at once, and then all the names are displayed in the profile.

In this branch (i haven't opened a PR yet, but i think it's probably ready to), I've added other display modes, to aggregate by Method or Function, and also to allow displaying if the method instance is being specialized with Consts: https://github.com/timholy/SnoopCompile.jl/compare/nhd-snoopi_deep-flamegraph-printing

Basically all of the other display modes print much faster, including slottypes (which displays the consts), so I think it certainly could be sped up. I think it's very likely that just no one has taken the time to optimize this, because until now no one was printing tens of thousands of MethodInstances in a loop 😅

So I think either we could make our own printing mechanism, or we could try to improve that one? I'm in favor of trying to improve the printing in Base, if possible?

NHDaly commented 3 years ago

(BTW, in case it's helpful: for now, I've been always just setting tmin_secs = 0.01 to prevent the flamegraph building from taking too long.)

NHDaly commented 3 years ago

😅 haha oh wow: I ran timing_fg = @snoopi_deep @eval flamegraph(timing); to find out where all that time is going, and 😅 half the time is in inferring frame_name - I think probably I should just change the type tuple from a compile-time argument to a runtime argument! Lemme give that a shot

Screen Shot 2020-11-27 at 12 06 41 PM

display_type=SnoopCompile.method

Screen Shot 2020-11-27 at 12 04 20 PM
NHDaly commented 3 years ago

😁 I'm glad you asked about this! I've been wanting to look into it for a long time, but now you prompted me, and it turns out it was user-error on our end, not a problem with julia's printing! 🙌 https://github.com/timholy/SnoopCompile.jl/pull/158

🤝