Closed tlienart closed 3 years ago
Sorry to be slow to respond, #225 has been taking a lot of time.
You generally want to run @snoopi_deep
in a fresh session. Once you infer a method, you generally don't need to reinfer it. (Redefining the method of course forces it to be reinferred.)
As far as filtering goes, for @snoopr
-measured invalidations, you can use filtermod
. That only filters on the root; we might want something that "hits" anything in a given module?
For @snoopi_deep
inference triggers, should be able to do something like
filter(itrig -> itrig.callerframes[end].linfo.def.module === Franklin, itrigs)
That should probably become filtermod
on a list of itrigs
.
Ah crap then I have 160 of those :/ (this is for another, much smaller module, which will eventually replace Franklin's parser: FranklinParser).
I think once I get the hang of reducing those, I might try to make a PR to extend the docs :)
Thanks again Tim!
It's pretty in flux right now. suggest
can triage them to help eliminate ones you don't need to address, or that might need to be addressed elsewhere (e.g., Julia itself). For example, my investigations have led me to tackle the concatenation pipeline, and led to these PRs:
and the net effect is an improvement in both runtime performance and precompilability.
I will play with FranklinParser at some point and see if I can help provide some context and/or additional tools. I recognize the complexity of these tools and am working to make them accessible to a wide audience, and you're very much in that target audience. So complain if it's unclear or too much work and I'll see what I can do to decrease the effort.
For now, the invalidations should be a much narrower set than all inference failures. In the long run, it may be easier to identify problems from the inference failures, but as you say right now the list of inference failures can be disturbingly long.
The main thing I'd recommend is to check whether you've tried the profile-guided despecialization GUI yet. https://timholy.github.io/SnoopCompile.jl/dev/pgdsgui/ I find it's a nice way to get a sense for the latency cost of design decisions.
Thanks a lot Tim! There's one very clear benefit of all this is that it made me think "wait, does the compiler know what that thing is or is it only known at runtime" and while this is painful in some cases, I think overall it might push me and others to just do things better (eg I now stumbled upon the section in the Julia sections in the docs warning against abusing the despatch system, something I was certainly doing).
I'll keep trying and will provide specific feedback once I'm a bit more comfortable with these tools!
Very good. You don't have to optimize everything; if "abusing dispatch" makes your code cleaner, and you don't really care about the runtime or latency hit, there's no problem with it. But if you are trying to make lean, low-latency, performant code, then yes, these things matter.
But if you are trying to make lean, low-latency, performant code, then yes, these things matter.
Yes well ultimately I'd like time to first serve in Franklin to be as low as possible, so if there are low hanging fruits to do so (and if it makes me a better Julia programmer along the way) I'll do it!
:+1: I see you still have a lot of invalidations. Fixing those is probably the highest payoff. To make your life easier I just pushed #226. I'll post a demo in https://github.com/tlienart/Franklin.jl/issues/752.
Thanks so much!
I went from 160 to 9 inference triggers in FranklinParser 😅 , the remaining ones are a bit hard to track down but the situation is likely significantly better than it was before! thanks again, this is an amazing package.
Woot! That's awesome news. Great work!
SnoopCompile 2.5.0 just came out a couple of moments ago. It has quite a few changes to the inference triggers analysis code. In particular, for https://github.com/timholy/SnoopCompile.jl/issues/222#issuecomment-762891137 the new aggregation of triggers by method, plus the ability to filter, may help reduce the daunting number.
It's now down to zero 😅 and I think the code is all the better for it.
One small note: to "remove" the final one, I had to inline a function that I used to create a const dictionary used elsewhere something like
const RULES = LittleDict{...}(
key1 => method(params1),
key2 => method(params2),
...
)
The suggestion from SnoopCompile was to try precompiling the function. It's not immediately clear to me why the inlining worked but maybe that could be an extra example in the docs?
That sounds good. Can you point me somewhere so I can understand the changes in greater detail?
https://github.com/tlienart/FranklinParser.jl/tree/validator
Some of the changes are pretty major (the parsing before was running with "oracle functions" that were encapsulating a condition for a section of string to meet, now this is done with concrete objects).
this is the inline: https://github.com/tlienart/FranklinParser.jl/blob/07d2ecd8de9ebc0b135d586fbf8d2da03c5695a7/src/tokens/utils.jl#L102-L117
the function forward_match
is only used as a way to define rules like here: https://github.com/tlienart/FranklinParser.jl/blob/07d2ecd8de9ebc0b135d586fbf8d2da03c5695a7/src/tokens/markdown_tokens.jl#L8-L18
this: https://github.com/tlienart/FranklinParser.jl/blob/validator/src/tokens/find_tokens.jl is the function I ended up significantly refactoring in view of the output from SnoopCompile.
if you compare it with https://github.com/tlienart/FranklinParser.jl/blob/main/src/tokens/find_tokens.jl you'll see some, probably obvious, elements:
TokenFinder
a proper objectI also got rid of a few NTuple{N, Char} where N
to replace by Vector{Char}
, looks like inference wise this is better?
Thanks!
So I presume you're referring to this:
julia> itrig = ftrigs[1]
Inference triggered to call MethodInstance for forward_match(::String, ::Vector{Char}, ::Bool) from forward_match (/home/tim/.julia/dev/FranklinParser/src/tokens/utils.jl:108) inlined into MethodInstance for forward_match(::String) (/home/tim/.julia/dev/FranklinParser/src/tokens/utils.jl:108)
julia> suggest(itrig)
/home/tim/.julia/dev/FranklinParser/src/tokens/utils.jl:108: invoked callee (forward_match(refstring::String) at utils.jl:108 may fail to precompile)
(I removed the @inline
). The bottom line is that I think that invoked callee
triggers probably reflect a bug in Julia. I've noticed that methods that supply default arguments sometimes fail to precompile. xref https://github.com/JuliaLang/julia/issues/38983.
It's quite interesting that inlining circumvents this problem, but I'm not sure I want to recommend that in general---forced inlining can also increase latency since it prevents you from just exploiting already-compiled code. Some methods that don't precompile are so huge that forced inlining would be a bad idea.
On balance, I think we just need to fix this bug. Fortunately, in your case the cost is cheap:
julia> itrig.node
InferenceTimingNode: 0.001454/0.002313 on InferenceFrameInfo for FranklinParser.forward_match(::String, ::Vector{Char}, ::Bool) with 5 direct children
You probably won't really notice that 2.3ms very much. There are circumstances where this is more problematic than that (e.g., https://github.com/JuliaLang/julia/issues/38983#issuecomment-757143042 as well as that JuliaInterpreter.step_expr!
method).
the function I ended up significantly refactoring
Yes, that looks like a good move from an inferrability/latency perspective. There are cases where it's valid to do it the way you were doing it previously, but if that λ
was getting called many times on short jobs, I suspect your change may also improve runtime performance. If you find that's not true (regex parsing can be slow), and if runtime performance matters more than latency, then of course you could go back to the λ
approach. If FranklinParser defines all the functions that will ever be called from find_tokens
, you can add precompile
directives to at least make sure they're all ready to go. (You only get an inference trigger if it's not already inferred.) You'd still have to pay the cost of runtime dispatch, but that sometimes may be smaller than running a regex parser. (And sometimes, not.)
I also got rid of a few NTuple{N, Char} where N to replace by Vector{Char}, looks like inference wise this is better?
If N
is in a small list the former would be ok, e.g., making it Union{Tuple{Char}, Tuple{Char,Char}}
rather than NTuple{N,Char}
would probably give you better runtime performance than Vector{Char}
and no extra cost in latency. But if N
is not bounded, or even if it ranges from, say, 1-10, you're probably better off with the Vector{Char}
. If you really have a performance bottleneck and you want to use the tuple, a "manual union splitting" block like
tf = if set isa NTuple{1,Char}
c ∈ set
elseif set isa NTuple{2,Char}
c ∈ set
elseif set isa NTuple{3,Char}
c ∈ set
elseif set isa NTuple{4,Char}
c ∈ set
...
end
can circumvent the inference problem from using NTuple{N,Char}
. As long as the types listed after the isa
are concrete types, the type-check is just a pointer comparison (i.e., very fast) and then the c ∈ set
gets specialized for each length of tuple. That block is the same thing that would be created by Julia's automatic union-splitting, but that only happens for unions with 3 or fewer types, whereas with the manual approach you can choose how many cases to write out.
Again, these are ugly and should only be reserved for something that is very "hot" performance-wise. But it's nice that Julia makes this kind of code possible when it's really important.
Thanks a lot Tim! this all makes sense to me. I think in terms of FranklinParser I probably won't try to optimise it much more as there's a bunch of other low hanging fruits to fix in Franklin itself first and runtime performance right now is already high enough (standard pages build in ~5-10ms). But I've enjoyed learning about all this a lot and will definitely keep an eye on SnoopCompile and its evolutions as it really helps be aware of some peculiarities that are good to be kept in mind 😄
Again thanks a lot for all your advices and recommendations, I learned a lot!
Yeah, from looking at FranklinParser briefly I concluded it was good to go. Hopefully this adventure will bring you many long-term benefits!
Hello,
I was trying out a few things in a project to reduce the number of inference triggers when I realised that if I ran
@snoopi_deep
after restarting Julia, I got a huge number of triggers whereas after that I only got 4, all from Base. Which result should I consider?Also is it possible to filter the inference triggers by module?
Thanks!