microsoft / TypeScript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
https://www.typescriptlang.org
Apache License 2.0
100.72k stars 12.45k forks source link

Incremental Binding #54222

Open yf-yang opened 1 year ago

yf-yang commented 1 year ago

Suggestion

πŸ” Search Terms

βœ… Viability Checklist

My suggestion meets these guidelines:

Background

This suggestion has similar setting as https://github.com/microsoft/TypeScript/issues/35120. Language service + binding phase.

I am not sure if I missunderstand or miss any crucial details, so I'll try to describe the suggestion along with my understanding of how the language service works.

Now, whenever something changes in a source file, the program is refreshed and a type checker is recreated, which introduces a binding phase. In VSCode + TS server setting, even a single keystroke that add a character would eagerly triggers a geterr call to the server. I don't have any statistics, but I believe most of the changes only modify a small piece of AST within a single file.

Therefore, TS server has incremental parser, it picks nodes from "clean area" and reuse them. For text from "dirty area", new AST nodes are created. Therefore, we try the best to reuse those nodes.

The previous step happens in a change call, and immediately a geterr call is sent from VSCode, which triggers checker initialization and consequently binding.

In binding stage, the only granularity we reuse is SourceFile. The binder achieves this by checking .locals member of the source file. Since a new SourceFile node must be create no matter how many nodes we reuse during incremental parsing phase, if the sourcefile is changed, the .locals must be undefined. Then, the binder iterate the whole AST, recreate the symbol table if a (reused) node has one.

It leads to an interesting scenario. Suppose I have a file with multiple file-scope function declarations, it is better to split those functions into multiple files, at least for the performance of binding phase, because most of the time I am working on only one function, but all of those functions are binded again and again.

Suggestion

Here comes the point: the AST is pretty wide for those nodes having an array of statement children. When I am working on the code, I mainly focus on not so much of the wide node. Suppose I have a block with 80 lines, including 5 subblocks, then with high probability 4 of those subblocks are not touched, so their symbol table could be reused.

How to achieve this? I'd like to suggest a simple algorithm: in incremental parsing stage, a newly created node would be marked as dirty. If a child node is marked as dirty, then its parent is also marked as dirty (but its siblings may not necessirily be dirty).

When binder starts to iterate an AST (here we know, the root node, i.e. SourceFile must be dirty), whenever it encounters a clean node, it is guaranteed that all of its children are within the "clean area", so binder would bypass it.

Again, I am not sure if I miss critical preconditions. If those analysis and assumptions are correct, then I believe the solution would somehow mitigate the problem addressed by https://github.com/microsoft/TypeScript/issues/35120

RyanCavanaugh commented 1 year ago

I think your analysis is correct on the preconditions and post conditions. The only thing I would caution is that the gains here are very much bounded by the total time that binding takes on a single file, which is really not very long at all given that it's just a simple tree walk without any difficult or involved computations. We would definitely want to start with a measurement of what the worst case binding performance is in the first place (e.g. checker.ts).

yf-yang commented 1 year ago

All right, before we step into actual operations, I'll theoretically illustrate that this is the first step of further performance improvement β€”β€” reusing type checker cache: nodeLinks/symbolLinks.