Open overlookmotel opened 1 month ago
https://github.com/oxc-project/oxc/pull/4332 also had a much simpler method for calculating capacities required - guess! Probably we'll find that calculating counts accurately in parser has a very minimal overhead. But in case that's not the case, we could look at estimating counts based on source text length.
Where we're at
https://github.com/oxc-project/oxc/pull/4367 added an AST traversal at start of building
Semantic
, to count how many nodes, scopes, symbols, and references there are in the AST.It uses these counts to pre-allocate sufficient capacity in
AstNodes
,ScopeTree
andSymbolTable
before the main traversal which populates these structures with data. This avoids these structures needing to grow during the main traversal, with all the memory-copying involved in that. The result was a big boost for performance.How we can improve on it
We can do even better by compiling these counts in parser while AST is being built, and remove the extra AST traversal that https://github.com/oxc-project/oxc/pull/4367 introduced. Parser can return a
Stats
object as part ofParserReturn
.Experiments discussed in https://github.com/oxc-project/oxc/pull/4328#issuecomment-2238712898 and https://github.com/oxc-project/oxc/pull/4328#issuecomment-2238779154 suggest we can get a significant perf boost by removing the extra AST pass - perhaps another 15%-20% on
oxc_semantic
benchmarks, and maybe even more than that onRadixUIAdoptionSection.jsx
benchmark (which I think is the most important one).How to count nodes
As Boshen pointed out, counting nodes can be done in
AstBuilder
. There's a few complications to this, however:Currently
AstBuilder
isCopy
, which we'll need to change to make it stateful.Transformer currently creates multiple copies of
AstBuilder
. This becomes a problem if all these copies need to call methods which require&mut AstBuilder
. We don't want the overhead ofRefCell
. But transformer doesn't need these counts anyway - counts are clear just fromnodes.len()
etc.A couple of options:
AstBuilder
into a trait. The builder which parser uses can include node count, builder transformer uses doesn't have to. So transformer's builder can beCopy
.AstBuilder
in transformer viactx.ast
(wherectx
isTraverseCtx
), rather thanself.ctx.ast
. This removes the need to have multiple copies ofAstBuilder
.I prefer option 2.
Also, the parser currently bypasses AST builder in some places. We'd need all AST node creation to go via the builder.
How to count scopes, symbols, references
Counting scopes, symbols and references can also be done in parser.
Accurately counting scopes is a little complicated because some scopes are conditional, so it requires some surrounding context to decide if a scope is going to be created or not. e.g.
for (let key in obj)
creates a scope,for (var key in obj)
doesn't.Alternatively, we can skip that complication and just count anything which could be a scope. Scopes count may then be over-estimated, but that's probably not a big problem - over-allocating capacity in
ScopeTree
is not very costly, it's only under-estimating which is very expensive, as it causesScopeTree
to have to grow.