microsoft / TypeScript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
https://www.typescriptlang.org
Apache License 2.0
100.19k stars 12.38k forks source link

Serialization/Deserialization API for AST #33502

Open kitsonk opened 4 years ago

kitsonk commented 4 years ago

Search Terms

serialize deserialize ast node

Ref #26871 Ref #28365

Suggestion

Implement/expose an API that can serialize/deserialize AST nodes, in particular a way to serialize a source file so it can be easily transferred or stored, potentially manipulated, and hydrated back.

Somewhat related is #28365 which would allow ingestion of AST directly for certain use cases. It was suggested before in #26871 for performance reasons, but not for externalising the AST in a form that can be persisted easily.

Use Cases

Specifically in Deno, we are interested in doing some AST generation or manipulation external to the JavaScript runtime. Doing some computationally intense functions not in JavaScript can have significant improvements in speed. For example, we originally did our source map remappings for errors in JavaScript using the source-map package from Mozilla. In version 0.7+ though, Mozilla had written the mappings part of it in Rust transpiled to WASM. We were able to use that directly in the privileged side of Deno, in its native Rust and saw something like a 100x increase in performance.

At this point we don't specifically know how we would use it, though there can potentially be a need to procedurally generate some aspects like our runtime type library or do some AST transformations for bundling. We have experimented with doing this manipulation in a JavaScript runtime and wonder if we can get performance improvements by doing some stuff in Rust.

Right now, a full parse, transform and emit is the only way to "persist" code. If you want to make further changes, it is a reparse. It seems logical that this transitory state could be utilised for a lot of different advanced use cases. It also would in a way decouple the parser and the transformer externally.

There are a good amount of advanced use cases that people build on top of AST generators like acorn and estree, so it is logical that other advanced use cases could be built if serialisation/deserialisation is available.

Two big challenges are likely that there is a lot of circular references, which makes JSON a difficult serialisation format. There is a lot of hidden state in a SourceFile that isn't "owned" by the surface object. That would have to be exposed as part of a serialisation which isn't present as properties in the existing nodes.

I started to look at tsbuildinfo to see if some sort of compiler or AST state is persisted somewhere that could be used to rehydrate the state of the compiler, but really haven't looked to see if internally TypeScript can serialise a program state.

Examples

const sourceFile = ts.createSourceFile("mymodule.ts", "console.log(\"hello\");", ts.ScriptTarget.ESNext, true);

// Returns a POJO of ts.SerializedNode?
const serializedSourceFile = ts.serialize(sourceFile);

// Because it is a POJO, it can be easily serialized
console.log(JSON.stringify(serializedSourceFile));

// Accepts a ts.SerializedNode?
const sourceFile2 = ts.deserialize(serializedSourceFile);

Checklist

My suggestion meets these guidelines:

RyanCavanaugh commented 4 years ago

We'd need some use cases that directly benefit from TypeScript specifically in tsc-like scenarios, otherwise this could just be something a different parser could do. We do a lot of stuff like having specific node constructors to eke out extra performance and anything done here would have to be tolerant to that.

zzmp commented 4 years ago

There seem to be preliminary instructions for this in #3662.

I'm interested in this as well, for the same reasons listed in #3662 - faster startup times. It seems like something that was considered but never implemented, based on the createDocumentRegistryInternal methods ExternalCache option. I'll report back if I have any luck with implementing the tips in #3662.

zzmp commented 4 years ago

I was unable to make any progress on this. SourceFile nodes are deeply nested, and converting them to strings rendered them too large to store. I am trying to improve startup time for an in-browser language service (similar to Monaco), so the dehydrated SourceFile would have needed to be small enough to be stored in IndexedDB.

ahnpnl commented 3 years ago

Hi, I got stuck as well with serializing SourceFile. Is there a clear way to do that ? Checking the documentation https://github.com/Microsoft/TypeScript/wiki/Using-the-Language-Service-API#document-registry mentioned that

A more advanced use of the document registry is to serialize SourceFile objects to disk 
and re-hydrate them when needed.

but there is no example of how to do it.

yuhuung commented 2 weeks ago

Any progress about this? Facing the same issue.