Open DanielRosenwasser opened 6 months ago
A relatively easyish experiment to start out with would be to
tranformFlags
field from every nodeTransformFlags
from outside the emit pipeline and make those walks unconditionalOr possibly just make transformFlags
a getter that returns all possible TransformFlags
as set. Then we could see what this does for memory and speed.
I tried the above suggestion as it wasn't too hard to do. There's more here in the "non-transform" case, unfortunately; a lot of stuff depends on these flags past just for performance, e.g. we use it to know when we need to reparse for await
, to avoid transforming code when it would behave the same without transform, detecting if JSX was ever used within a specific subtree, and seemingly to make decisions about how to transform generators and such.
I pushed my attempt to https://github.com/jakebailey/TypeScript/tree/transform-flag-hack-test, but it's also possible that I just did it wrong.
Background
Today, TypeScript aggregates
TransformFlags
on every node upon construction. This makes it possible to keep track of specific language features that must be transformed later on in our emit pipeline. If a node does not contain specific transform flags, certain passes can avoid walking over a region of the tree (or avoid walking over a tree entirely).However, this incurs an overhead on every single node in our syntax trees, regardless of whether TypeScript is being used for emit. If a developer uses TypeScript strictly for type-checking, then these transform flags may not be justified. Even if TypeScript is being used for emit, this information may never be requested in editor scenarios.
Furthermore,
TransformFlags
may become less relevant over time. Back when we introduced the emit pipeline, we had a lot of different transformations that might need to be performed based on every language target. We wanted to do as few walks of a given tree as possible, but I would like to revisit some of the core assumptions we had back then. In the last few years, we've seen that:In this issue, I'd like to explore the possibility of deferring the calculation of
transformFlags
on nodes, or removing them entirely.Proposal
The chief idea is to remove the
transformFlags
property from every node in the tree. Instead of using a property, we could store this information in aMap
during the emit pipeline on a givenSourceFile
, or globalWeakMap
as soon as node hits the emit pipeline. Depending on which, this could mean that re-emitting aSourceFile
might require recalculating this information; but it could also mean that less information would have to be stored between building different files.One upside to this is that with this strategy,
TransformFlags
don't need to be stored unless the target is low enough in the first place; so this would become more of a pay-for-play feature.The downside is that this would require a walk of the tree to calculate the
transformFlags
for a given node. More on that below.Challenges and Possible Solutions
Requiring Extra Walks
As mentioned above, when you do need to emit, you would need to walk the tree to calculate the
TransformFlags
for each node. Doing this at bind-time is tempting, but that would mean all that information needs to sit around regardless of when it is used. In general, and especially for incremental build scenarios, it would be more ideal to only hold on to this information when needed.We could do a walk per a file. That's not ideal, but it is more pay-for-play.
Another thing to note is that most nodes go through at least one transform anyway: the TypeScript transform. We could modify that transform to act more like a full tre walk, but could then calculate the
TransformFlags
for subsequent transforms.Reducing Passes
While doing this could relieve us of memory pressure in non-emit scenarios, it's very likely that
Map
s andWeakMap
s have a higher memory overhead and lookup cost. One way to offset this might be to fold a few of the existing transform steps into a single step. For example, can the ES2020, ES2021, and ES2022 transforms be combined into a single transform? These transforms tend to be relatively simple, and this would reduce some of our code size, along with the number of top-down walks to transform, say??=
.Non-Transform Uses of
TransformFlags
One challenge is that
TransformFlags
are used in a few places in the compiler - not just in the emit pipeline. We use this information to avoid deeper walks of our trees in the type-checker, and in the language service. We would need to ensure that we don't regress performance in these areas.Forgetting to Propagate
TransformFlags
One of the nice parts of how our node factories work is that they by-default always aggregate the
transformFlags
of their children, and introducing a new node makes it easier to notice that you need to callaggregateTransformFlags
. One of the counter-arguments to moving aggregation into an existing non-dedicated pass is that historically when we did that, we often ended up with bugs from miscalculation. Still, it's possible for these flags to be miscalculated in the current node factory system.