oxc-project / oxc

⚓ A collection of JavaScript tools written in Rust.
https://oxc.rs
MIT License
12k stars 433 forks source link

Serialize typescript related AST fields with `undefined` instead of `null` #6284

Closed ottomated closed 2 weeks ago

ottomated commented 2 weeks ago

Currently, when building oxc-compatible AST nodes from within Javascript, typescript-related fields (typeAnnotation, etc.) are serialized and deserialized as nullable fields. This makes for pretty inconvenient code if you're not working with them, i.e.

const param = {
  type: 'FormalParameter',
  pattern: {
    ...pattern,
    typeAnnotation: null,
  },
  decorators: [],
  accessibility: null,
  readonly: false,
  override: false
}

If these were serialized using something like #[serde(skip_serializing_if)], it would not only make the serialized values smaller (saving FFI transfer times), but also make constructing the nodes manually easier:

const param = {
  type: 'FormalParameter',
  pattern,
  // all the typescript-only fields are optional!
}

Here's an example of what it could look like. Since the javascript oxc-parser package is still experimental, this should be fine as a breaking change.

overlookmotel commented 2 weeks ago

Thanks for raising this. I've actually been thinking along the same lines, but with a different implementation method. https://github.com/oxc-project/oxc/issues/6347 (see section 3).

I'm sorry to say that the JS API is not our current focus - we're currently working on implementing the rest of the transforms down to ES6, and hardening the existing ones - so all Rust-side stuff. Hence why I opened the issue in our "backlog" repo instead of here. But, personally, I do feel it's really important that the JS-facing APIs are as good as the Rust ones, so we will turn our attention to this at some point soon(ish).

Closing this issue, not because it's an unreasonable request, but because I think https://github.com/oxc-project/oxc/issues/6347 already covers it.

overlookmotel commented 2 weeks ago

By the way, speaking of FFI transfer times, we do have something in the works which is going to massively speed up moving ASTs between Rust and JS. Watch this space...

ottomated commented 2 weeks ago

Thanks for the response! The linked issue seems great, let me contextualize a bit more:

I'm currently researching what it would take to port Svelte's parser from acorn + acorn-typescript to oxc. So far I've found that even with FFI overhead, oxc via napi is anywhere from 5-20x faster! However, there are a few important limitations that have forced me to inject this code into oxc via a build script:

Additionally, I hacked together this type definition file by adding a new generator to ast_tools that invoked tsify. It works alright, but is definitely not as easy to use as @types/estree. One thing to consider is this: when building ASTs in JS, you can't specify the Span for each node. This forced me to duplicate the type definitions to have one version with start and end on the base node interface and one without.

In the linked backlog issue, you mentioned estree compatibility. This would save a lot of effort in the migration, but I'm curious if this would include having a single Literal node vs. NumericLiteral etc.? In the backlog issue, you mention

Currently we have pretty complex custom Serialize impls for massaging Oxc's AST into ESTree-compatible shape in oxc_ast/src/serialize.rs.

but currently while the shape is similar, it's definitely not a drop-in replacement.

Final question - what would the timeline look like? I would like to contribute to this if the oxc team is going to be focusing on other stuff.

overlookmotel commented 2 weeks ago

Thanks for coming back. Interesting points. Am tied up now - will reply tomorrow.

overlookmotel commented 2 weeks ago

On the ESTree compat question: Our intention is 100% ESTree compatibility (so, yes, one Literal type). Please see #2463 and #2854.

The only part that's fuzzy is what flavour of ESTree-like AST for TS? I've not got to the bottom of this question, but my understanding is there are a few competing implementations, and I'm not sure:

  1. How similar they are to each other.
  2. Whether they (a) extend ESTree by just adding extra node types, and extra properties to existing nodes, or (b) in some places they diverge from original ESTree (i.e. a "TS ESTree" AST representing a JS-only program would not be same as the "original ESTree" AST for same program). If the latter, it's more problematic.

Do you have any idea on that?

ottomated commented 2 weeks ago

I've used https://astexplorer.net to compare different outputs. I've also looked at SWC's compatibility layer, which has an option for babel and acorn (although theirs is incomplete and doesn't convert all ASTs).

From what I understand, typescript ASTs are not standardized at all. You'll get different representations from typescript-eslint, acorn-typescript (which is abandoned and undocumented), or the official typescript parser.

typescript-eslint seems to represent declare function as a different node type which might be an issue.

In my mind since there's no consensus among parsers it might be best to take whichever path is easiest to implement for oxc. Acorn should be the primary target for compatibility, and since acorn has no official typescript support and the only plugin for it is undocumented, this is an opportunity to provide accurate and complete type definitions for future tooling.

ottomated commented 2 weeks ago

Is there already any progress on https://github.com/oxc-project/oxc/issues/6347 or would it be okay to start work on a PR for it?

overlookmotel commented 2 weeks ago

@ottomated I've moved #6347 into the main repo. I've asked the main man Boshen if he sees any problem with what I've proposed. I assume he'll be fine with it, but I just want to check before you pile in and do a load of work - don't want to waste your time. Assuming all good, it'd be brilliant if you would like to take it on.

I'm currently researching what it would take to port Svelte's parser from acorn + acorn-typescript to oxc. So far I've found that even with FFI overhead, oxc via napi is anywhere from 5-20x faster!

I'm actually really surprised it's so fast. Oxc's parser is certainly speedy, but from when I last benchmarked, the cost of transferring AST between Rust and JS rather hobbles it, so it was in same ballpark as JS parsers (Acorn, Babel etc). But if it's working better than I thought, great!

One thing to consider is this: when building ASTs in JS...

You are aware that Oxc currently doesn't expose a JS interface to get an AST back from JS into Rust to print it? We should, and we will, but it's not there yet.

currently while the shape is similar, it's definitely not a drop-in replacement.

Yes, it's a work in progress. It's closer to ESTree than it was, but still diverges in a lot of places (as I'm sure you've found). One of the motivators for #6347 is to make it easier to complete the job of getting it in line. Writing Serialize impls by hand is not trivial (impl Serialize is pretty horrible API) and those impls need to be kept in sync when we make changes to the Rust AST, which is also painful.

In my mind since there's no consensus among parsers it might be best to take whichever path is easiest to implement for oxc. Acorn should be the primary target for compatibility, and since acorn has no official typescript support and the only plugin for it is undocumented, this is an opportunity to provide accurate and complete type definitions for future tooling.

Thanks for the background on this.

I agree that for plain JS we should converge with ESTree (Acorn) and aim for 100% compatibility.

For TS, either typescript-eslint or official TS parser's AST sounds like the best target. @ArnaudBarre, who did a lot of the last batch of work on getting Oxc's JS AST into line with ESTree, was keen to use Oxc as a parser for Prettier, which we gather requires a typescript-eslint AST (https://github.com/oxc-project/oxc/issues/2463#issuecomment-1956607866), so maybe that's the best starting point.

typescript-eslint seems to represent declare function as a different node type which might be an issue.

There will no doubt be many such oddities. We'll just have to work around them. It's not out of the question to alter our Rust AST to match ESTree/TS-ESTree on occasion to help. Personally I think our AST should have a separate node type for declare function anyway, though others might disagree.

Anyway... thanks loads for your interest in working on this. It's one of those things that I think is really important, and I have been meaning for ages to get onto it, but just not enough time in the day...

ottomated commented 2 weeks ago

Great! I'll hold for now and probably will have a bunch of questions once I get started (for example, maybe we don't even want to impl Serialize if we're serializing to a custom binary format that needs corresponding JS deserialization code)

I'm actually really surprised it's so fast...

From my tests, most of the difference is that acorn-typescript is really slow. But even with plain JS oxc was faster (didn't test with any HUGE files though).

You are aware that Oxc currently doesn't expose a JS interface to get an AST back from JS into Rust to print it?

Yes, for the work I was doing on svelte I wrote my own napi package that patched oxc and defined the specific methods I needed.

Yes, it's a work in progress. It's closer to ESTree than it was, but still diverges in a lot of places...

I expect any work I do on #6347 would be establishing a baseline that allows for the customization that these divergences would need rather than focusing on the individual cases.

overlookmotel commented 2 weeks ago

@ottomated Spoke to Boshen. We have go-ahead for #6347. I'll write up some thoughts on that issue later today.

I think we can close this issue in favour of that one? Or do you consider them slightly different things?

Also, do you have Discord? It might be useful to have a quick chat. Please can you join Oxc Discord and ping me?