Open overlookmotel opened 2 weeks ago
JS-only AST (as discussed in point 3 above) has been requested by a user: https://github.com/oxc-project/oxc/issues/6284
Personally, I think it's a completely reasonable ask.
@Boshen we have a contributor (@ottomated - see #6284) keen to work on this. Before he gets going, do you see any problem with my proposal above?
I spoke to Boshen. He's happy with the direction of this PR. Sounds like @ottomated is ready to get stuck in to implementation.
I suggest doing this in phases:
Serialize
impls with oxc_ast_tools
Replace:
#[ast]
#[cfg_attr(feature = "serialize", derive(Serialize))]
#[serde(tag = "type", rename = "RestElement")]
pub struct AssignmentTargetRest<'a> {
#[serde(flatten)]
pub span: Span,
#[serde(rename = "argument")]
pub target: AssignmentTarget<'a>,
}
with:
#[ast]
#[generate_derive(Serialize)]
#[serde(tag = "type", rename = "RestElement")]
struct AssignmentTargetRest {
#[serde(flatten)]
pub span: Span,
#[serde(rename = "argument")]
pub target: AssignmentTarget<'a>,
}
Serialize
impls alone.Serialize
to ESTree
yet.#[serde]
attributes.Tsify
alone for now.derive
feature on serde
dependency.#[serde]
attrs boilerplate#[ast]
#[generate_derive(Serialize)]
#[serde(rename = "RestElement")] // <-- `tag = "type"` removed
struct AssignmentTargetRest {
// `#[serde(flatten)]` removed - `#[serde(flatten)]` on `Span` struct instead
pub span: Span,
#[serde(rename = "argument")]
pub target: AssignmentTarget<'a>,
}
Handle these in oxc_ast_tools
codegen instead.
Tsify
oxc_ast_tools
generates type def file itself.const TS_APPEND_CONTENT: &'static str = "...";
manual type defs.ESTree
#[ast]
#[generate_derive(ESTree)]
#[estree(rename = "RestElement")]
struct AssignmentTargetRest {
pub span: Span,
#[estree(rename = "argument")]
pub target: AssignmentTarget<'a>,
}
I'm actually not quite sure how to do this, while still using serde::Serialize
under the hood.
Serialize
impls#[ast]
#[generate_derive(ESTree)]
pub struct ObjectPattern<'a> {
pub span: Span,
pub properties: Vec<'a, BindingProperty<'a>>,
#[estree(append_to_previous)]
pub rest: Option<Box<'a, BindingRestElement<'a>>>,
}
This is the tricky/interesting part. The idea is create a kind of domain-specific language (DSL) to cover the various transformations needed to go from Rust AST to JS ESTree AST. That DSL is the #[estree(...)]
attributes.
The advantage of a DSL which is static is that we can generate multiple things from it:
Serialize
impls.Deserialize
impls (so we can provide an oxc-codegen
NPM package).I'm not completely sure how far we can get with the DSL approach. "append to previous" is a pattern that's used in several types, so it makes sense to make an #[estree(append_to_previous)]
attr for it.
But for odd transforms which are only used in one place, we may prefer something like this:
#[ast]
#[generate_derive(ESTree)]
#[estree(via(MyTypeShim))]
pub struct MyType {
one: u32,
two: u32,
}
struct MyTypeShim {
sum: u32,
}
impl From<&MyType> for MyTypeShim {
fn from(mt: &MyType) -> Self {
MyTypeShim { sum: mt.one + mt.two }
}
}
#[estree(via(...))]
is analogous to #[serde(from)]
and #[serde(into)]
. But I'm hoping we can use just 1 "intermediary" type to go in both directions.
The above "mini-roadmap" is a suggestion rather than a list of demands! Am totally open to different ways to split up the work.
But I do think we should split it up into multiple steps somehow, because (a) smaller PRs are easier to review and (b) if the effort doesn't reach the finish line, we'll at least get part of the way, and others can continue it later on.
Serialize
impls are now codegen-ed by oxc_ast_tools
(phase 1 on "roadmap" above)We still use #[derive(Serialize)]
on a few custom Serialize
impls in serialize.rs. Tsify
is completely gone.
In my opinion the next steps are:
oxc-parser
package.d.ts
file.oxc-parser
package (not sure how to combine them with the types generated by napi-rs
).wasm-bindgen
for WASM. But replace all the short const TS_APPEND_CONTENT
statements with one giant one including all the types as a single string.The reason I think we should do this first is it'd be great to get all the type defs checked into git as a single file, so we'll notice if the types mistakenly get changed during further work.
Currently JSON AST for RegExpLiteral
contains the entire parsed regexp Pattern
. This is a huge deviation from ESTree, and the serialization of RegExps is generally a mess.
JSON AST should just contain strings for pattern
and flags
, as ESTree does.
We can remove the EmptyObject
hack. That type only exists to produce a value
field in the JSON AST, and is otherwise a pointless annoyance!
Previously, type defs were in this style:
export interface BooleanLiteral extends Span {
type: "BooleanLiteral";
value: boolean;
}
Now they're like this:
export type BooleanLiteral = ({
type: 'BooleanLiteral';
value: boolean;
}) & Span;
I am no TypeScript expert, but I understand from Boshen that the two are almost equivalent, but that there is a slight difference - the interface
style gives nicer error messages.
Our ts types came from typescript-eslint, I would model them as such https://github.com/typescript-eslint/typescript-eslint/blob/main/packages/ast-spec/src/expression/ArrayExpression/spec.ts
Are we able to go back to interface
?
#[estree]
attrs#[estree(rename_all = "camelCase")]
from enums. Just make camelCase the default. It only seems to be present on fieldless enums.#[estree(untagged)]
. I think we can decide whether to tag based on if enum is fieldless or not?#[tsify(type = "...")]
with #[estree(type(...))]
(or #[estree(type = "...")]
). That's just a renaming thing.#[estree(flatten)]
boilerplateSee "phase 2" in previous comment.
Primarily I'm talking about Span
here. It'd be great not to need #[estree(flatten)]
on every single span: Span
field.
Note: TSThisParameter
has a this_span: Span
field which should not be flattened. typescript-eslint doesn't include a span for this
, so we can just skip serializing that field, rather than needing an #[estree(no_flatten)]
workaround.
🤷 Open to suggestions on how to approach this one!
We currently use
serde
's derive macros to implementSerialize
on AST types.We could use
#[generate_derive]
to generate these impls instead.Why is that a good thing?
1. Reduce compile time
serde
's macro is pretty expensive at compile time for the NAPI build. We can remove it.2. Reduce boilerplate
serde
's derive macro is less powerful thanast_tools
. BecauseSerialize
is a macro, all it knows about is the type that#[derive(Serialize)]
is on. Whereasast_tools
builds a schema of the entire AST, so it knows not just about the type it's deriving impl for, but also all the other types too, and how they link to each other.Currently we have to put
#[serde]
attributes everywhere:Instead, we can use
ast_tools
in 2 ways to remove this boilerplate:ast_tools
's knowledge of the whole AST to move the instruction to flattenSpan
ontoSpan
type itself. "flatten this" instruction does not need to be repeated on every type that containsSpan
.I think this is an improvement. How types are serialized is not core to the function of the AST. I don't see moving the serialization logic elsewhere as "hiding it away", but rather a nice separation of concerns.
3. Open the door to different serializations
In example above
Serialize
has been replaced byESTree
. This is to allow for different serialization methods in future. For example:Different serializers for plain JS AST and TS AST
When serializing a plain JS file, could produce JSON which skips all the TS fields, to make an AST which exactly aligns with canonical ESTree. We'd add
#[ts]
attribute to all TS-related fields, andESTreeJS
serializer would skip those fields. This would make the AST faster to deserialize on JS side.The other advantage is the TS-less AST should perfectly match classic ESTree, so we can test it in full using Acorn's test suite.
Users who are not interested in type info can also request the cheaper JS-only AST, even when parsing TS code.
Serialize to other AST variants
e.g.
#[generate_derive(Babel)]
to serialize to a Babel-compatible JSON AST.Not sure if this is useful, but this change makes it a possibility if we want to.
4. Simplify implementation of custom serialization
Currently we have pretty complex custom
Serialize
impls for massaging Oxc's AST into ESTree-compatible shape in oxc_ast/src/serialize.rs.We can remove most of them if we use
ast_tools
to generateSerialize
impls for us, guiding it with attributes on the AST types themselves:5. Simply AST transfer code
AST transfer's JS-side deserializer (and eventually serializer too) can be simplified in same way, generating code for JS-side deserializer which matches the Rust-side one exactly, without writing the same logic twice and having to keep them in sync.
6. TS type generation
What "massaging" of the Rust AST we do to turn it into an ESTree-compatible JSON AST is now encoded as static attributes. We can use this to generate TS types, and we can get rid of
Tsify
.How difficult is this?
serde
's derive macro looks forbiddingly complex. But this is because it handles every conceivable case, almost all of which we don't use. The output it generates for our AST types is actually not so complicated.So creating a codegen for
impl Serialize
I don't think would be too difficult.