Closed ayazhafiz closed 2 years ago
Kicking this off today. I'm going to try to track work and various investigations for this in https://github.com/orgs/roc-lang/projects/1. Please let me know if you're interested in helping out with the investigation and resolution of this!
I've archived https://github.com/orgs/roc-lang/projects/1/views/1. After https://github.com/roc-lang/roc/pull/3981 lands, I would like to mark this project complete.
Over the past couple weeks we've increasingly observed pathological compiler performance in the presence of deeply nested, capturing lambda sets. See #3449 for one report, and this Zulip thread for a longer discussion.
The current pathological performance occurs when there is a large-enough chain of lambdas called in sequence, and called in such a sequence that each lambda captures the next lambda in sequence (and by extension, its transitive closure). This issue is opened in order to form a technical plan for addressing this problem, and for us to coordinate work on tackling it.
As an example, take the lazy definition of Effect.after:
Notice that
Effect.after
returns an arrow that captures both its parameterseffect
andtoEffect
. Thus the following programhas the elaboration
Notice that
e2
contains everythinge1
contains, and the top-level lambda contains everythinge2
contains. So, the total size of the lambda sets types (across the whole program) is quadratic in the depth of the largest chain of captures that capture other capturing lambda sets. That's too many "capture"s in a row, so let's just call this behavior a "lambda capture-chain".You can see that the total size of the capture-chains will grow to be exponential in the presence of branches. One branch would make the total size
4^d
, two branches would make the total size8^d
, etc, whered
is the longest depth of a capture-chain.The performance of this really is quite poor. As of #2226, the False interpreter now takes 2 seconds to compile on my 2021 M1. @Qqwy has observed much worse performance in #3449, where a parser combinator takes hours to compile, if it compiles at all.
Observations
I'll enumerate some observations I've previously made in investigating this problem in the context of #2226.
App
module. In order to specialize theEffect.after
calls, their specialization must be done in the context of all of theEffect
module's specializations. This is because (today) only theEffect
module can see all of its types, definitions, etc. This is done for parallelization reasons. The consequence of this is that in order to specialize the nestedEffect.after
calls in ourApp
module, theApp
module must export types those calls' types for specialization in theEffect
module. In the example above, we ask for twoEffect.after
external specializations, and we independently export the wanted specialization types. In particular this means that no type variables are reused between the wanted specializations, even though many type variables between the two calls are exactly the same in theApp
module! And so, that means that theEffect
module will see them as totally separate types as well. We've now blown up the forest of types quadratically in the number of trees, when before it may very well have been just a single tree. Moreoever, when theEffect
module goes to import the exported types, it does not preserve type variables in the domain being imported from between imports. For example, suppose theEffect
module importsa : t1
andb : t1 -> t2
from the external specializations store. TheEffect
module will import them asa : t11
,b : t31 -> t32
- note the relationship betweent11
andt31
is lost in the image, though it was present in the domain.Solutions
Here are some options I think will mitigate or eliminate this problem, in what I consider to be in decreasing order of effectiveness.
Effect
module to specialize the outermostEffect.after
call, then the innerEffect.after
, then the lastEffect.always
call in sequence, without any layout cache invalidation, in order to maximize reuse of cached layouts and avoid rewalking type trees. This is tricky to get right, but the fact that the type ofEffect.after
is nested in the type of the innerEffect.always
which is nested in the type of the outermostEffect.after
call gives us a starting point - at the very least you can walk type trees to determine the desired specialization order. How to do this more efficiently remains unclear in my head, but there is probably a good way, especially since the syntax tells us how we should expect the types to be nested.Finally, I'd like to note two things: