oxc-project / backlog

backlog for collborators only
1 stars 0 forks source link

Count nodes, scopes, symbols, references in parser #74

Open overlookmotel opened 1 month ago

overlookmotel commented 1 month ago

Where we're at

https://github.com/oxc-project/oxc/pull/4367 added an AST traversal at start of building Semantic, to count how many nodes, scopes, symbols, and references there are in the AST.

It uses these counts to pre-allocate sufficient capacity in AstNodes, ScopeTree and SymbolTable before the main traversal which populates these structures with data. This avoids these structures needing to grow during the main traversal, with all the memory-copying involved in that. The result was a big boost for performance.

How we can improve on it

We can do even better by compiling these counts in parser while AST is being built, and remove the extra AST traversal that https://github.com/oxc-project/oxc/pull/4367 introduced. Parser can return a Stats object as part of ParserReturn.

Experiments discussed in https://github.com/oxc-project/oxc/pull/4328#issuecomment-2238712898 and https://github.com/oxc-project/oxc/pull/4328#issuecomment-2238779154 suggest we can get a significant perf boost by removing the extra AST pass - perhaps another 15%-20% on oxc_semantic benchmarks, and maybe even more than that on RadixUIAdoptionSection.jsx benchmark (which I think is the most important one).

How to count nodes

As Boshen pointed out, counting nodes can be done in AstBuilder. There's a few complications to this, however:

Currently AstBuilder is Copy, which we'll need to change to make it stateful.

Transformer currently creates multiple copies of AstBuilder. This becomes a problem if all these copies need to call methods which require &mut AstBuilder. We don't want the overhead of RefCell. But transformer doesn't need these counts anyway - counts are clear just from nodes.len() etc.

A couple of options:

  1. Make AstBuilder into a trait. The builder which parser uses can include node count, builder transformer uses doesn't have to. So transformer's builder can be Copy.
  2. Only access AstBuilder in transformer via ctx.ast (where ctx is TraverseCtx), rather than self.ctx.ast. This removes the need to have multiple copies of AstBuilder.

I prefer option 2.

Also, the parser currently bypasses AST builder in some places. We'd need all AST node creation to go via the builder.

How to count scopes, symbols, references

Counting scopes, symbols and references can also be done in parser.

Accurately counting scopes is a little complicated because some scopes are conditional, so it requires some surrounding context to decide if a scope is going to be created or not. e.g. for (let key in obj) creates a scope, for (var key in obj) doesn't.

Alternatively, we can skip that complication and just count anything which could be a scope. Scopes count may then be over-estimated, but that's probably not a big problem - over-allocating capacity in ScopeTree is not very costly, it's only under-estimating which is very expensive, as it causes ScopeTree to have to grow.

overlookmotel commented 1 month ago

https://github.com/oxc-project/oxc/pull/4332 also had a much simpler method for calculating capacities required - guess! Probably we'll find that calculating counts accurately in parser has a very minimal overhead. But in case that's not the case, we could look at estimating counts based on source text length.