Open asl opened 2 weeks ago
That seems like a crazy statistic in general. Either the inliner calls ResolveReferences hundreds of times or the pass is just slow.
That seems like a crazy statistic in general. Either the inliner calls ResolveReferences hundreds of times or the pass is just slow.
It former. Here are some stats: overall ResolveReferences
was called 175 times. Each time is approx 120-150 ms. This sums to 22 seconds total time, yes.
One of factors that contribute is quite severe, I'd say. Essentially, portion of all the time is spent in ISimpleNamespace::getDeclByName()
. Which in turn is just IndexedVector<T>::getDeclaration
. IndexedVector<T>
stores declaration name in ordered_map<cstring, const IDeclaration *>
. The latter is essentially the pair of:
std::list<std::pair<cstring, IDeclaration*>>
std::map<cstring*, std::list::iterator, Comparator>
Comparator
here essentially dereferences cstring*
and calls std::less<cstring>(a, b)
. The latter is:
bool operator<(cstring a) const { return *this < a.str; }
bool operator<(const char *a) const { return str ? a && strcmp(str, a) < 0 : !!a; }
Do you see this strcmp
call? So, yes, you are reading it correctly: in all ordered_map<cstring, foo>
corresponding keys are ordered lexicographically.
This corresponds to maybe 10% of ResolveReferences
runtime. Interesting enough, everything else is just overhead from topdown IR visiting :)
It seems that ActionsInliner
does some updates of ReferenceMap
(using SubstituteParameters
), but it seems it is not enough.
@mihaibudiu @ChrisDodd Do you remember any details here?
I am not sure what "is not enough" in the above statement means.
@ChrisDodd has expressed interest to remove the ResolveReferences pass for a long time, and he did some work in this direction, but it was never completed. But I am not sure that his proposed changes will improve the runtime.
This is just how the compiler was architected: the types and references are recomputed every time the IR changes. We never imagined that this would become a bottleneck.
I am not sure what "is not enough" in the above statement means.
Well, here is the comment in substituteParameters.h
:
// When a PathExpression is cloned, it is added to the RefMap.
// It is set to point to the same declaration as the original path.
// But running this pass may change some declaration nodes - so
// in general the refMap won't be up-to-date at the end.
And indeed, not recalculating refMap leads to some declaration not found
assertions. I have not checked yet what they are and what they correspond to.
You can now look up references in the current context of any visitor, by having tbe visitor inherit from P4::ResolutionContext and then calling any of the various resolve methods in that class to look up the declaration for a name. This makes the refMap unnecessary.
The memoization that now happens in these resolve methods should make repeatedly resolving the same names quite fast,
You can now look up references in the current context of any visitor, by having tbe visitor inherit from P4::ResolutionContext and then calling any of the various resolve methods in that class to look up the declaration for a name. This makes the refMap unnecessary.
The memoization that now happens in these resolve methods should make repeatedly resolving the same names quite fast,
That's very useful. Any particular requirements for the P4 program that P4::ResolutionContext
has?
You can now look up references in the current context of any visitor, by having tbe visitor inherit from P4::ResolutionContext and then calling any of the various resolve methods in that class to look up the declaration for a name. This makes the refMap unnecessary.
Sadly, everything is entangled in quite messy mass. refMap
is also used for unique name generation. So, for example, MethodInstance::resolve
does not require full refMap
, but it may resort to TypeInference
which requires "full" refMap
(so, both DeclarationLookup
and NameGenerator
). It just happens that mid-end use-def only calls MethodInstance::resolve
with null typeMap
, so things are working fine for it :) TypeInference
could certainly be made a ResolutionContext
, but NameGenerator
would still be required.
So, looks like we'd need in some cases to still use refMap
for name generation. At least for the time being.
So, looks like we'd need in some cases to still use refMap for name generation.
For name generation, you can use the MinimalNameGenerator pass/object instead of the refmap -- tracking unique names is really independent of contextual name lookup, so a separate class seems better. That would requirie refactoring TypeInference to take both a DeclarationLookup and a NameGenerator (when using the RefMap they could be the same). I have an old refactoring of type inferencing sitting a a branch I had been working on to remove the refmap from type inferencing, but I never got it completely working. (https://github.com/ChrisDodd/p4c/tree/typecheck-remove-refmap)
So, looks like we'd need in some cases to still use refMap for name generation.
For name generation, you can use the MinimalNameGenerator pass/object instead of the refmap -- tracking unique names is really independent of contextual name lookup, so a separate class seems better. That would requirie refactoring TypeInference to take both a DeclarationLookup and a NameGenerator (when using the RefMap they could be the same).
Yes, this is exactly what I'm playing with now :)
but I never got it completely working.
One issue that I already found is that ResolveReferences
has implicit and very subtle assumption to be mixed into Inspector
. TypeInference
is a Transform
that even in read-only mode does useless cloning of the nodes (this certainly needs to be fixed eventually). As a result, we are calling getDeclaration
on cloned nodes and shallow node comparisons inside lookup
fail. The solution is to check in the original context, not the cloned one.
Currently looking over remaining issues here and there.
Some very preliminary results from downstream test. Before:
________________________________________________________
Executed in 87.54 secs fish external
usr time 134.23 secs 0.26 millis 134.23 secs
sys time 2.08 secs 3.87 millis 2.08 secs
After:
________________________________________________________
Executed in 69.84 secs fish external
usr time 101.63 secs 0.06 millis 101.63 secs
sys time 2.64 secs 2.17 millis 2.63 secs
Some very preliminary results from downstream test. Before:
________________________________________________________ Executed in 87.54 secs fish external usr time 134.23 secs 0.26 millis 134.23 secs sys time 2.08 secs 3.87 millis 2.08 secs
After:
________________________________________________________ Executed in 69.84 secs fish external usr time 101.63 secs 0.06 millis 101.63 secs sys time 2.64 secs 2.17 millis 2.63 secs
Are these gains specifically because of inliner optimizations or changes to the refMap/getDeclByName
?
Are these gains specifically because of inliner optimizations or changes to the refMap/
getDeclByName
?
The whole idea is to sunset refMap entirely. The speedup above is obtained by switching to ResolutionContext
inside inliner, so declarations are looked up when necessary (without recalculation of path => declaration map for all paths in the program which is certainly huge overkill even for small programs). This removed ~175 recalculation of refMap via ResolveReferences
.
For the record, this particular app used to take ~450 seconds to compile back in December 2023.
https://github.com/p4lang/p4c/pull/4757 is a working PR accumulating required changes
We are seeing
ResolveReferences
taking 25% out of 90 second compile time of large p4 app. The issue here is inliner as it clears and recalculates refmap of the whole program after each inlining step and here we are having lots of inlining happening.Certainly, properly updating refmap might be a bit tricky, but still something should be done :)