Closed Boshen closed 4 months ago
@overlookmotel @rzvxa @Dunqing
I think we can start prototyping once we decide on the generation method.
I vote for all Rust, but I'm not sure about its implications or downsides.
👍for all Rust, Now that we know our requirements and are done with prototyping we could leverage syn for our code generation.
It is harder to iterate on but we won't need to parse and generate rust code ourselves, We can treat it like a global macro if you are willing to use nightly for codegen, We quite literally can use !#[code_gen]
at the top of the AST definition file, macro expand it for the build and call it a day.
If we want to break our ast definition into separate files or don't use nightly at all; We can manually parse the string using syn and vice versa.
Nice to have: we might want to generate the AST itself from our definition file, So it is easy to do AST-wide refactors in the future.
Good catch, we should move all ast implementations to another file, so we can parse ast nodes easily.
Or maybe we should scan the file using syn, and extract everything marked as #[visited_node]
.
On nightly: hard no, sorry.
Good catch, we should move all ast implementations to another file, so we can parse ast nodes easily.
Or maybe we should scan the file using syn, and extract everything marked as #[visited_node].
I'm more in favor of having the definition file(s) for ast containing only the struct/enum and its attribute/derives, With this we are not only simplifying the build process(since any item in the global scope of definition file is an AST type) but also provide a way to add to the codegen, For example, if you remember for ast_node_id
I wanted to add an ast_node
attribute that would add the id and implement its trait, Which slows down the build process. But if we can easily add such things as a part of the code gen process we get the change fast and free (in terms of build times).
But if we only process the types and use it to generate the visitors/traversals we won't be able to change ast types themselves.
On the other hand, if we read these structs and modify them before generating the actual AST types we get to do magical things such as reducing all inherit macros in the generated code, It would be a much better experience if the actual type is there, Both in terms of LSP and documentation, Not to mention easier to understand if the resulting code is there instead of an inherit.
I wanted to follow how React Compiler does this
but @overlookmotel prefers writing it in Rust because it feels more natural.
I don't mind either.
Also good catch on wanting to add common attributes to the nodes.
I think in the end both the rust syntax and these JSON files are the same thing. Having it as rust syntax and parsing it to get our serialized data is more steps for sure but I have to agree with @overlookmotel, It is much easier to understand and reason about our schema this way, Unless we want to share these schemas with other projects I think having ast_def.rs
is > than ast_def.json
.
I vote Rust too. I only wrote Traverse
codegen in JS to be able to bang it out quickly. It's a hack, and brittle, as it doesn't parse the Rust code properly - it's basically using regexes - so it could easily break. My intent was always to "do it properly" and rewrite in Rust using syn
etc, once the impl was stable.
I propose we do this in 2 parts:
The schema would be generated by:
.rs
files (NB: includes files in oxc_span
and oxc_syntax
, not just oxc_ast
).StructDefinition
/ EnumDefinition
for each type.Bar
in struct Foo { bar: Bar }
to enum Bar
defined elsewhere).Schema types could be something like https://github.com/overlookmotel/layout_inspect/blob/master/layout_inspect/src/defs.rs
(we don't need align
and size
at present - we can miss those parts out)
This schema can then be output as both:
static
(see https://github.com/oxc-project/oxc-browserslist/issues/23)2 reasons:
Firstly, AST transfer needs a schema just like this - so we can kill 2 birds with 1 stone.
Secondly, the schema can also be used in proc macros. Fixing oxc-project/oxc#4297 requires replacing the dodgy inherit_variants!
macro with a proc macro. Problem is that proc macros only have access to the section of code they're applied to, but inherited enums need to know what variants other enums have. The #[inherit_variants]
proc macro can get that info from the schema.
(see "Option 3" in oxc-project/oxc#4297)
We don't need to implement rkyv serialization initially. We can just use JSON to serialize the schema to start with. rkyv is an optimization to avoid having serde
and serde_json
as dependencies of the macro crate, which will massively reduce the compile time impact of those macros.
Personally I am anti generating the AST types themselves. I think we should write the AST types in Rust (as at present) and then generate everything around them.
I laid out my reasoning for this in oxc-project/oxc#4297 (at length, sorry probably way too much length).
When we want to alter the AST types themselves, we can use proc macros. This does have a downside in terms of compile time, but the rkyv-serialized schema thing is designed to reduce that impact to a minimal level.
These are just my thoughts... feel free to disagree!
Generating AST types
Personally I am anti generating the AST types themselves. I think we should write the AST types in Rust (as at present) and then generate everything around them.
I laid out my reasoning for this in oxc-project/oxc#4297 (at length, sorry probably way too much length).
When we want to alter the AST types themselves, we can use proc macros. This does have a downside in terms of compile time, but the rkyv-serialized schema thing is designed to reduce that impact to a minimal level.
These are just my thoughts... feel free to disagree!
What if we have definitions (schema) written as rust struct/enums? Then if we even need to output a JSON schema we can create it from the syntax data instead of json information.
Look at this and tell me it isn't possible to write all of this in rust https://github.com/facebook/react/blob/main/compiler/crates/react_estree_codegen/src/ecmascript.json.
What are the differences between these 2 pieces of string other than deserialization method?
"StringLiteral": {
"fields": {
"value": {
"type": "String",
"hermes_convert_with": "convert_string_value"
}
}
},
struct StringLiteral {
#[hermes_convert_with("convert_string_value")]
value: String,
}
My point is that we can have our cake and eat it too, We can write definitions in the rust itself, yet still use this information and build upon it to generate the actual AST types.
Example usecase: codegen/ast_def.rs
#[magic_attr]
struct StringLiteral<'a> {
#[hermes_convert_with("convert_string_value")]
pub value: Atom<'a>,
}
ast/ast.rs
#[hermes_stuff]
struct StringLiteral<'a> {
#[hermes_convert_with("convert_string_value")]
pub value: Atom<'a>,
magical_field: MagicType<'a>,
}
impl<'a, 'm> BlackMagic<'a, 'm>for StringLiteral<'a> {
fn cast(&'m self) -> MagicMisile<'a, 'm> {
self.magical_field.cast(MagicKind::Black).into()
}
}
AST definition file should have its own prelude so we are confined to the use of known types in our ast definitions(happens in the proc macro/codegen). This way we can disallow the use statements or path qualifiers. With this imposed limitation on our ast definition we no longer have to resolve and link types, We just use a hashmap since they are all known beforehand. Ast uses other AST types, Box, Vec, Atom, str, and a handful of other types but there is no random type there(as it should be).
What are the differences between these 2 pieces of string other than deserialization method?
Agreed. No difference. Except... then we have 2 .rs
files which both look like they are the AST definitions file. Which do you edit? As I said under option "2c" on oxc-project/oxc#4297:
However, there's a major DX problem. IDEs' "jump to definition" would take you to the built folder, not src. I am sure people would end up editing the files in built and then be surprised when all the code they wrote is lost when the build script re-runs and overwrites these files.
One of our goals is for the codebase to be accessible to new contributors. The AST is a central pillar of Oxc, probably the first place you go when you start exploring the codebase. So I think it's particularly important that the AST "just makes sense". Entering a world of confusion on your very first step could be a massive turn-off.
To summarize: as I understand it, the major pluses and minuses are:
So, my view is:
serde
in macros, we can minimize the compile time impact.But... I've said my piece now. It's a preference rather than a strongly held view - I can see both sides of the argument. If you both feel strongly that you prefer to codegen AST, please feel free to overrule me. I won't be upset!
Only thing I ask is please do try to go through my thinking in oxc-project/oxc#4297, and try to follow my reasoning. If you get my arguments, but don't agree with them, no problem. But please do consider them.
MagicKind::Black
🤣🤣🤣 Love it!
One of our goals is for the codebase to be accessible to new contributors. The AST is a central pillar of Oxc, probably the first place you go when you start exploring the codebase. So I think it's particularly important that the AST "just makes sense". Entering a world of confusion on your very first step could be a massive turn-off.
It is ironic that it is my exact point for going in the opposite direction😆 I feel since AST doesn't change often after we hit stable release, it wouldn't result in confusion it is almost impossible for a first-time contributor to have to change it if it is all up to spec, All of our traits and impls are in a separate file that isn't generated. What we are generating with the AST are additions that would happen to the structures such as inherits and adding fields or macros.
I think since these are the actual things existing on the type, jumping to the generated definition is self-documenting because of no magic code, basically, what you see is what you get.
so I can see oxc_ast
with this structure(I think the generated
directory communicates well enough that these files shouldn't be edited directly, And if it isn't enough we can always add a giant docs comment at the top of the file stating this fact):
oxc_ast/
|_src/
|_______generated/
|_______ast.rs(or js.rs,ts.rs,...)
|_______also contains visit/VisitMut,Visit and traverse/Traverse, etc.
|_______ast_impl.rs
|_______lib.rs
|_______....
oxc_ast_def(or maybe oxc_ast_gen)/
|_src/
|______ast_def.rs(or js_def.rs,ts_def.rs,...)
Then we manually export these generated codes in our lib.rs
.
I think for you to make the call on this @Boshen!
Haha, can we start somewhere small?
I propose:
Gather the AST nodes somehow (we don't need the attributes yet),
and layout the backbone of codegen for
ASTKind
GetSpan
Visit
VisitMut
Traverse
ASTBuilder
once this is in place, we can experiment with different ast definition strategies and pick the one that fits my requirements.
If parsing #[visit_node]
works for the time being, then we can stick with it and iterate later.
The pipeline is
Definition -> Model in Rust -> an easy mechanism for writing out boilerplates -> code.
The first milestone will just be the backbone of the [Model in Rust -> an easy mechanism for writing out boilerplates -> code] part.
I'm going to go for a walk and mull this over. Give me an hour and I'll come back with some comments.
To back my point, This is a common pattern in our current AST definition:
inherit_variants! {
/// Expression
///
/// Inherits variants from [`MemberExpression`]. See [`ast` module docs] for explanation of inheritance.
///
/// [`ast` module docs]: `super`
#[visited_node]
#[repr(C, u8)]
#[derive(Debug, Hash)]
#[cfg_attr(feature = "serialize", derive(Serialize, Tsify))]
#[cfg_attr(feature = "serialize", serde(untagged))]
pub enum Expression<'a> {
BooleanLiteral(Box<'a, BooleanLiteral>) = 0,
NullLiteral(Box<'a, NullLiteral>) = 1,
NumericLiteral(Box<'a, NumericLiteral<'a>>) = 2,
BigintLiteral(Box<'a, BigIntLiteral<'a>>) = 3,
RegExpLiteral(Box<'a, RegExpLiteral<'a>>) = 4,
StringLiteral(Box<'a, StringLiteral<'a>>) = 5,
TemplateLiteral(Box<'a, TemplateLiteral<'a>>) = 6,
Identifier(Box<'a, IdentifierReference<'a>>) = 7,
MetaProperty(Box<'a, MetaProperty<'a>>) = 8,
Super(Box<'a, Super>) = 9,
ArrayExpression(Box<'a, ArrayExpression<'a>>) = 10,
ArrowFunctionExpression(Box<'a, ArrowFunctionExpression<'a>>) = 11,
AssignmentExpression(Box<'a, AssignmentExpression<'a>>) = 12,
AwaitExpression(Box<'a, AwaitExpression<'a>>) = 13,
BinaryExpression(Box<'a, BinaryExpression<'a>>) = 14,
CallExpression(Box<'a, CallExpression<'a>>) = 15,
ChainExpression(Box<'a, ChainExpression<'a>>) = 16,
ClassExpression(Box<'a, Class<'a>>) = 17,
ConditionalExpression(Box<'a, ConditionalExpression<'a>>) = 18,
FunctionExpression(Box<'a, Function<'a>>) = 19,
ImportExpression(Box<'a, ImportExpression<'a>>) = 20,
LogicalExpression(Box<'a, LogicalExpression<'a>>) = 21,
NewExpression(Box<'a, NewExpression<'a>>) = 22,
ObjectExpression(Box<'a, ObjectExpression<'a>>) = 23,
ParenthesizedExpression(Box<'a, ParenthesizedExpression<'a>>) = 24,
SequenceExpression(Box<'a, SequenceExpression<'a>>) = 25,
TaggedTemplateExpression(Box<'a, TaggedTemplateExpression<'a>>) = 26,
ThisExpression(Box<'a, ThisExpression>) = 27,
UnaryExpression(Box<'a, UnaryExpression<'a>>) = 28,
UpdateExpression(Box<'a, UpdateExpression<'a>>) = 29,
YieldExpression(Box<'a, YieldExpression<'a>>) = 30,
PrivateInExpression(Box<'a, PrivateInExpression<'a>>) = 31,
JSXElement(Box<'a, JSXElement<'a>>) = 32,
JSXFragment(Box<'a, JSXFragment<'a>>) = 33,
TSAsExpression(Box<'a, TSAsExpression<'a>>) = 34,
TSSatisfiesExpression(Box<'a, TSSatisfiesExpression<'a>>) = 35,
TSTypeAssertion(Box<'a, TSTypeAssertion<'a>>) = 36,
TSNonNullExpression(Box<'a, TSNonNullExpression<'a>>) = 37,
TSInstantiationExpression(Box<'a, TSInstantiationExpression<'a>>) = 38,
// `MemberExpression` variants added here by `inherit_variants!` macro
@inherit MemberExpression
}
}
It is only this way because it isn't feasible to copy all of these manually and keep track of them when one changes.
So with code-gening the AST we still have this syntax for our definitions to keep it consistent,
/// Expression
///
/// Inherits variants from [`MemberExpression`]. See [`ast` module docs] for explanation of inheritance.
///
/// [`ast` module docs]: `super`
#[visited_node]
#[repr(C, u8)]
#[derive(Debug, Hash)]
#[cfg_attr(feature = "serialize", derive(Serialize, Tsify))]
#[cfg_attr(feature = "serialize", serde(untagged))]
pub enum Expression<'a> {
BooleanLiteral(Box<'a, BooleanLiteral>) = 0,
NullLiteral(Box<'a, NullLiteral>) = 1,
NumericLiteral(Box<'a, NumericLiteral<'a>>) = 2,
BigintLiteral(Box<'a, BigIntLiteral<'a>>) = 3,
RegExpLiteral(Box<'a, RegExpLiteral<'a>>) = 4,
StringLiteral(Box<'a, StringLiteral<'a>>) = 5,
TemplateLiteral(Box<'a, TemplateLiteral<'a>>) = 6,
Identifier(Box<'a, IdentifierReference<'a>>) = 7,
MetaProperty(Box<'a, MetaProperty<'a>>) = 8,
Super(Box<'a, Super>) = 9,
ArrayExpression(Box<'a, ArrayExpression<'a>>) = 10,
ArrowFunctionExpression(Box<'a, ArrowFunctionExpression<'a>>) = 11,
AssignmentExpression(Box<'a, AssignmentExpression<'a>>) = 12,
AwaitExpression(Box<'a, AwaitExpression<'a>>) = 13,
BinaryExpression(Box<'a, BinaryExpression<'a>>) = 14,
CallExpression(Box<'a, CallExpression<'a>>) = 15,
ChainExpression(Box<'a, ChainExpression<'a>>) = 16,
ClassExpression(Box<'a, Class<'a>>) = 17,
ConditionalExpression(Box<'a, ConditionalExpression<'a>>) = 18,
FunctionExpression(Box<'a, Function<'a>>) = 19,
ImportExpression(Box<'a, ImportExpression<'a>>) = 20,
LogicalExpression(Box<'a, LogicalExpression<'a>>) = 21,
NewExpression(Box<'a, NewExpression<'a>>) = 22,
ObjectExpression(Box<'a, ObjectExpression<'a>>) = 23,
ParenthesizedExpression(Box<'a, ParenthesizedExpression<'a>>) = 24,
SequenceExpression(Box<'a, SequenceExpression<'a>>) = 25,
TaggedTemplateExpression(Box<'a, TaggedTemplateExpression<'a>>) = 26,
ThisExpression(Box<'a, ThisExpression>) = 27,
UnaryExpression(Box<'a, UnaryExpression<'a>>) = 28,
UpdateExpression(Box<'a, UpdateExpression<'a>>) = 29,
YieldExpression(Box<'a, YieldExpression<'a>>) = 30,
PrivateInExpression(Box<'a, PrivateInExpression<'a>>) = 31,
JSXElement(Box<'a, JSXElement<'a>>) = 32,
JSXFragment(Box<'a, JSXFragment<'a>>) = 33,
TSAsExpression(Box<'a, TSAsExpression<'a>>) = 34,
TSSatisfiesExpression(Box<'a, TSSatisfiesExpression<'a>>) = 35,
TSTypeAssertion(Box<'a, TSTypeAssertion<'a>>) = 36,
TSNonNullExpression(Box<'a, TSNonNullExpression<'a>>) = 37,
TSInstantiationExpression(Box<'a, TSInstantiationExpression<'a>>) = 38,
@inherit MemberExpression
}
But we give the end user this well-documented code to read:
/// Expression
///
/// Inherits variants from [`MemberExpression`]. See [`ast` module docs] for explanation of inheritance.
///
/// [`ast` module docs]: `super`
#[visited_node]
#[repr(C, u8)]
#[derive(Debug, Hash)]
#[cfg_attr(feature = "serialize", derive(Serialize, Tsify))]
#[cfg_attr(feature = "serialize", serde(untagged))]
pub enum Expression<'a> {
BooleanLiteral(Box<'a, BooleanLiteral>) = 0,
NullLiteral(Box<'a, NullLiteral>) = 1,
NumericLiteral(Box<'a, NumericLiteral<'a>>) = 2,
BigintLiteral(Box<'a, BigIntLiteral<'a>>) = 3,
RegExpLiteral(Box<'a, RegExpLiteral<'a>>) = 4,
StringLiteral(Box<'a, StringLiteral<'a>>) = 5,
TemplateLiteral(Box<'a, TemplateLiteral<'a>>) = 6,
Identifier(Box<'a, IdentifierReference<'a>>) = 7,
MetaProperty(Box<'a, MetaProperty<'a>>) = 8,
Super(Box<'a, Super>) = 9,
ArrayExpression(Box<'a, ArrayExpression<'a>>) = 10,
ArrowFunctionExpression(Box<'a, ArrowFunctionExpression<'a>>) = 11,
AssignmentExpression(Box<'a, AssignmentExpression<'a>>) = 12,
AwaitExpression(Box<'a, AwaitExpression<'a>>) = 13,
BinaryExpression(Box<'a, BinaryExpression<'a>>) = 14,
CallExpression(Box<'a, CallExpression<'a>>) = 15,
ChainExpression(Box<'a, ChainExpression<'a>>) = 16,
ClassExpression(Box<'a, Class<'a>>) = 17,
ConditionalExpression(Box<'a, ConditionalExpression<'a>>) = 18,
FunctionExpression(Box<'a, Function<'a>>) = 19,
ImportExpression(Box<'a, ImportExpression<'a>>) = 20,
LogicalExpression(Box<'a, LogicalExpression<'a>>) = 21,
NewExpression(Box<'a, NewExpression<'a>>) = 22,
ObjectExpression(Box<'a, ObjectExpression<'a>>) = 23,
ParenthesizedExpression(Box<'a, ParenthesizedExpression<'a>>) = 24,
SequenceExpression(Box<'a, SequenceExpression<'a>>) = 25,
TaggedTemplateExpression(Box<'a, TaggedTemplateExpression<'a>>) = 26,
ThisExpression(Box<'a, ThisExpression>) = 27,
UnaryExpression(Box<'a, UnaryExpression<'a>>) = 28,
UpdateExpression(Box<'a, UpdateExpression<'a>>) = 29,
YieldExpression(Box<'a, YieldExpression<'a>>) = 30,
PrivateInExpression(Box<'a, PrivateInExpression<'a>>) = 31,
JSXElement(Box<'a, JSXElement<'a>>) = 32,
JSXFragment(Box<'a, JSXFragment<'a>>) = 33,
TSAsExpression(Box<'a, TSAsExpression<'a>>) = 34,
TSSatisfiesExpression(Box<'a, TSSatisfiesExpression<'a>>) = 35,
TSTypeAssertion(Box<'a, TSTypeAssertion<'a>>) = 36,
TSNonNullExpression(Box<'a, TSNonNullExpression<'a>>) = 37,
TSInstantiationExpression(Box<'a, TSInstantiationExpression<'a>>) = 38,
// begin inherit from `MemberExpression` (if you know what inherit means in this context good for you, Otherwise you still get all the information without needing to jump 2 or more times first to the inherit_variants and then to the MemberExpression to understand it.)
/// `MemberExpression[?Yield, ?Await] [ Expression[+In, ?Yield, ?Await] ]`
ComputedMemberExpression(Box<'a, ComputedMemberExpression<'a>>) = 48,
/// `MemberExpression[?Yield, ?Await] . IdentifierName`
StaticMemberExpression(Box<'a, StaticMemberExpression<'a>>) = 49,
/// `MemberExpression[?Yield, ?Await] . PrivateIdentifier`
PrivateFieldExpression(Box<'a, PrivateFieldExpression<'a>>) = 50,
// end inherit from `MemberExpression`
}
We would also generate the relevant match macros and those are also there to read(without nested declarative macro calls which can confuse newcomers) and maybe even omit the discriminates(in the definition file) since now we can calculate them at comptime (but having them explicitly defined is better in my opinion).
I can't help but find a generated file more understandable than how @inherit
and inherit_variants
works.
We can call our definition file ast.d.rs
to further communicate that it isn't a normal rust file(and there are things like @inherit
available).
If we don't want to have magical @inherit
we can use this syntax in our definition:
#[inheritance]
enum E {
A(A),
#[inherit]
B(B),
}
Not going to go into the detail of your comment, Rez, because this thread is getting too long. But I agree inherit_variants!
is a mess as it is now. There are various ways to fix it, and one of those ways is what you've suggested.
More generally, I've had a think, and have a few comments:
I have loosened my opposition to generating the AST types. I think what I was really opposed to is defining the schema as JSON (like Hermes/React Compiler does) which is verbose, ugly, and hard to read. Defining the schema as Rust types is much better. If we do the following, I think it would answer all my criticisms:
.def.rs
or something to indicate they are "special"..gen.rs
, or put them in a generated
directory.// This file is generated. Do not edit. Instead edit source at ../codegen/js.def.rs:321
.(not claiming all these ideas as my own, just summarizing in one place)
As Boshen says, we don't need to jump to doing this straight away. We can initially write our codegen for e.g. Visit
by reading the existing .rs
files in oxc_ast
, like Traverse
's codegen does.
If we're agreed that in the end our "schema" will be written as Rust types, making the change later to "schema files" should not involve a huge change to the codegen.
AST transfer needs a schema of all the AST types as JSON. So could we please build our codegen foundation along these lines:
If we do it like that, we can kill 2 birds with one stone, rather than having to build a 2nd parser/codegen for AST transfer later.
(I think this is the plan anyway, but just repeating to make sure)
Visit
and VisitMut
We should factor in https://github.com/oxc-project/oxc/issues/3392
TS definitions will be trickier to generate from Rust types schema because there isn't an exact correspondence between Rust types and JS types, due to the modifications we do with custom serde serializers to conform to ESTree format.
I suggest making generating TS defs a separate effort. I was intending to tackle that as part of AST transfer impl - probably easier because AST transfer deals in the JS types not the Rust types.
I am glad that you've backed off a little bit on your opposition, I believe all of your points are addressable(And I have a vague idea of how to achieve the rust-analyzer compatibility with preluding types and have a mock macro for inheritance to prevent syntax errors).
Since you've written the Traverse codegen, Would you like to also give this one a go?
I can start working on it next week so it is up to you to decide.
Just a note: Even if we don't want to have rust schema we might want to consider starting with extracting type definitions from impls in our ast files, Having separate files like js.rs
and js_impl.rs
can help to simplify the rust parsing part, We no longer have to parse a bunch of random items and check if they are AST type or not. I think private ast fields all can be pub(crate)
as a compromise to get them separated into 2 files.
Gleam uses Cap'n proto for its codegen. https://github.com/gleam-lang/gleam/blob/main/compiler-core/schema.capnp
Apparently we can't transfer an issue from a private repo to a public repo.
I'll chat with @rzvxa on discord and set the milestones once he finishes all cfg related code.
I'll create the issues in oxc when we have a consensus on what to build - the final version is in this issue's description.
Question: what should be done with code surrounded by inherit_variants!
? Feels like a blocker if we want #[visited_node]
to be the source of truth.
We could also codegen for all AST types:
Hash
impls (which should ignore any Span
, ScopeId
, SymbolId
, ReferenceId
fields).PartialEq
impls (also ignoring above fields).CloneIn
impls (#50).NB: CloneIn
could also be implemented on AST types with #[derive(CloneIn)]
, but it may improve compile time if we codegen the impls, to avoid macro expansion at compile time.
Apparently we can't transfer an issue from a private repo to a public repo.
Oh dear! It would be ideal for all of the conversation in this issue to be made public, so people can see not only what decisions were reached, but also the reasons for reaching them.
Personally, I think this is quite important. When I find code in Oxc which I don't understand the rationale for, I often look at git blame to find out in what PR that code was introduced, and then I read the surrounding issues. This helps me to understand why things have been designed the way they have.
As a workaround, can we temporarily make backlog repo public, transfer the issue to oxc repo, then quickly make backlog repo private again??
Question: what should be done with code surrounded by
inherit_variants!
? Feels like a blocker if we want#[visited_node]
to be the source of truth.
The codegen will need to parse and understand the @inherits Expression
syntax. This is what Traverse
's current codegen does.
We can use that understanding to codegen all the impls which inherit_variants!
macro currently generates, so inherit_variants!
becomes much slimmer than it is now.
Sorry @rzvxa I didn't answer your question:
Since you've written the Traverse codegen, Would you like to also give this one a go?
I'd be happy to, but I only have a couple of days before I go away, and am keen to finish off correcting spans and scopes in the transformer before I do. So if you're willing to work on this, by all means go ahead.
ASTKind (Milestone 1)
I think this one should be done after Visit
and VisitMut
Since without those there are going to be a lot of errors that we have to mark as todo
. So we have to do a bunch of manual refactoring just to redo it again after generating the new visitors.
Otherwise, it is the easiest to generate and has already been added to the ast_codegen.
ASTKind (Milestone 1)
I think this one should be done after
Visit
andVisitMut
Since without those there are going to be a lot of errors that we have to mark astodo
. So we have to do a bunch of manual refactoring just to redo it again after generating the new visitors.Otherwise, it is the easiest to generate and has already been added to the ast_codegen.
Oh ok, pick the easiest one to have it codegened so we can get it reviewed, merged, and replaced 😅
ASTKind (Milestone 1)
I think this one should be done after
Visit
andVisitMut
Since without those there are going to be a lot of errors that we have to mark astodo
. So we have to do a bunch of manual refactoring just to redo it again after generating the new visitors. Otherwise, it is the easiest to generate and has already been added to the ast_codegen.Oh ok, pick the easiest one to have it codegened so we can get it reviewed, merged, and replaced 😅
I've already did, GetSpan
is ready to replace the old one.
https://github.com/oxc-project/oxc/pull/3852
Didn't mark it for review so you guys can give a green light to the underlying architecture first.
AstKind
and GetSpan
generators, We just hold onto the AstKind
till the visitor generators get ready.These visit functions are outdated, do we have a better way to reimplement them here? It is now difficult to manually update https://github.com/oxc-project/oxc/blob/2fd55b0da22c6c8af9ec23dca976c2b47e5957bb/crates/oxc_semantic/src/builder.rs#L454-L1650
These visit functions are outdated
Outdated as in field visitation order is wrong?
I've moved this issue to main oxc repo as we're currently working on this.
These visit functions are outdated
Outdated as in field visitation order is wrong?
or missing visits to some fields
These visit functions are outdated, do we have a better way to reimplement them here? It is now difficult to manually update
Consensus was manually patch them, we eat our own 💩
It can be fine to have a slightly different visitation order for some implementations of our visits if it is necessary for them. The good thing about the generated code is that we have the correct walk order by default but it shouldn't prevent anyone from doing custom walks. For example, a specific implementation might require visiting a statement and then visiting the decorators for it.
It is fine as long as we acknowledge the difference and know it has a reason to be that way.
I'm mostly scared of scope events since they can drift apart from the source of truth and we use the source of truth version everywhere but the semantics.
Maybe we should generate the scope events as their own functions? something like scope::enter_statement(visitor)
that uses the scope
attribute information.
or implement scope visit as a trait for our types so we can write stmt.enter_scope(visitor)
.
We need a visit_scope_id
method. This way we can easily set the current_scope_id
for different ast's scope_id
in the semantic builder
Could we pass the AST node (&mut T
) into enter_scope
? That would avoid need for adding visit_scope_id
method.
Could we pass the AST node (
&mut T
) intoenter_scope
? That would avoid need for addingvisit_scope_id
method.
I was thinking about this when I saw the @Dunqing's comment. But the type erasure will prevent us from doing anything useful to the T
. Maybe we can implement a trait so this can be bound to something like Scoped<T>
?
If we don't want to pass in T we can pass AstType
+ &mut node.scope_id
we can pass
AstType
+&mut node.scope_id
Maybe that's better. I think just &node.scope_id
would do it. That removes need for generics (and only &
is required, not &mut
, since it's a Cell
).
That removes need for generics
I used AstType
instead of AstKind
for the same reason, but if we don't need it, then we shouldn't pass it in. I thought we may want to do some checking depending on the type we are entering but with scopeFlags, it should be enough information already.
and only & is required, not &mut, since it's a Cell
This one I just forgot, You are absolutely right.
pass scope_id
to enter_scope
just enough for now
pass
scope_id
toenter_scope
just enough for now
It seems we all agree on this so I'll go ahead and implement it
He done it! #4168
RFC is done, remains issues can be tracked in https://github.com/oxc-project/oxc/issues/3819
Requirements
#[visited_node]
Workflow
#[visited_node]
struct ASTModel { name: String, attributes: Vec<Attributes> }
Code to generate
ASTKind
(Milestone 1)GetSpan
(Milestone 2)Hash
,PartialEq
implementations (Milestone 3)ASTBuilder
(Milestone 4)Visit
(Milestone 5)VisitMut
(Milestone 5)Traverse
, remove current js code (Milestone 6)Every milestone should be a stack of PRs, since they will involve changing other files as well.
Things we need to discuss:
traverse
currently uses JavaScript, I think it's fine, but does it scale to supporting these many code patterns?update: The consensus is stay with our current approach and parse
#[visited_node]
as the source of truth.Topic not part of this RFC: