timholy / SnoopCompile.jl

Provide insights about latency (TTFX) for Julia packages
https://timholy.github.io/SnoopCompile.jl/dev/
Other
309 stars 48 forks source link

Use changes to Julia to have `@snoopi` report _all MethodInstances_ that are inferred during a single inference call #138

Closed NHDaly closed 3 years ago

NHDaly commented 4 years ago

Builds on https://github.com/JuliaLang/julia/pull/37535 to expose this information through SnoopCompileCore

NHDaly commented 4 years ago

I'm quite sure that I haven't gone about this in the best way. I'd love any suggestions or help if you are interested!!!! :) âĪïļ !

Hopefully this will prove useful for others as well!

timholy commented 4 years ago

Awesome that you're doing this. If I don't notice when you turn this into a real PR, just ping me.

NHDaly commented 4 years ago

Thank you @timholy!!


After some more thinking, together with @sacha0, i think we think that the approach we'd taken is for is probably not the best way to go about measuring this.

Our next idea is to try a bit more like the other approaches, and just find a better place to inject the timings. The reason i didn't think timings would work when i first looked at this, is that each function encapsulates the time for each subsequent function that it measures, so the timings aren't useful.

But then we realized if we were just able to report the timings in a nested structure, like a profile, they'd make sense! And then i remembered this is exactly what https://github.com/KristofferC/TimerOutputs.jl already does! 😁 So i think either duplicating that, or literally just using that, would get us what we want! As a first quick test, I just slapped TimerOutputs inside the compiler, and in this small test it seems indeed reasonable:

julia> module M
           i(x) = x+5
           i2(x) = x+2
           h(a::Array) = i2(a[1]::Integer) + i(a[1]::Integer) + 2
           g(y::Integer, x) = h(Any[y]) + Int(x)
       end;
WARNING: replacing module M.

julia> _to_ = TimerOutputs.TimerOutput();

julia> @code_typed M.g(2,3);   # Measure inference timee

julia> _to_ 
 ─────────────────────────────────────────────────────────────────────────────────────────────────
                                                          Time                   Allocations      
                                                  ──────────────────────   ───────────────────────
                 Tot / % measured:                     9.65s / 0.02%            887KiB / 30.7%    

 Section                                  ncalls     time   %tot     avg     alloc   %tot      avg
 ─────────────────────────────────────────────────────────────────────────────────────────────────
 MethodInstance for g(::Int64, ::Int64)        1   2.16ms   100%  2.16ms    273KiB  100%    273KiB
   MethodInstance for h(::Vector{Any})         1    766Ξs  35.5%   766Ξs    187KiB  68.7%   187KiB
     MethodInstance for i2(::Integer)          1    236Ξs  10.9%   236Ξs   42.6KiB  15.6%  42.6KiB
     MethodInstance for i(::Integer)           1    178Ξs  8.23%   178Ξs   42.6KiB  15.6%  42.6KiB
 ─────────────────────────────────────────────────────────────────────────────────────────────────

So i'll give an approach like that a go, next!! 😁

NHDaly commented 4 years ago

(I've filed an issue in TimerOutputs.jl to see about using it to generate essentially profiles, like i want above 😊: https://github.com/KristofferC/TimerOutputs.jl/issues/90)

NHDaly commented 4 years ago

(fwiw, here's the git diff for what I was demoing, above. I'm trying it now on our codebase to see if it reports anything useful!):

diff --git a/base/compiler/typeinfer.jl b/base/compiler/typeinfer.jl
index a0abd97276..a8bec821b5 100644
--- a/base/compiler/typeinfer.jl
+++ b/base/compiler/typeinfer.jl
@@ -9,6 +9,11 @@ function typeinf(interp::AbstractInterpreter, result::InferenceResult, cached::B
 end

 function typeinf(interp::AbstractInterpreter, frame::InferenceState)
+    Main.TimerOutputs.@timeit Main._to_ Main.string(frame.linfo) begin
+        _typeinf(interp, frame)
+    end
+end
+function _typeinf(interp::AbstractInterpreter, frame::InferenceState)
     typeinf_nocycle(interp, frame) || return false # frame is now part of a higher cycle
     # with no active ip's, frame is done
     frames = frame.callers_in_cycle

(The above of course wouldn't compile, but i'm eval'ing it in after into a running session. If this approach works, i'll likely copy what you did for @snoopi, and add a ccall to swap out the infer function to one that's timed. 😊)

timholy commented 4 years ago

Cool. Not sure if you know this but Revise.track(Core.Compiler) may make this easier to test.

NHDaly commented 4 years ago

Thanks @timholy -- I didn't know that, and I appreciate the tip! 😁