Open overlookmotel opened 2 weeks ago
I did a try to remove raw fields, but there are some issues...
I think the core problem is the mix purpose of AST, different requirement leads to different design.
For example, for compiling, we do not need any origin data, so the AST could be abstract.
But for linter or formatter, some abilities should not be enabled by default. Like, for this, they might not want 1_0_2_2 be converted to 1_022, at least there should be an option to control it.
Roslyn(Biome and RA too, they refers Roslyn) keeps a lossless syntax tree for this purpose. This design makes their AST great for interreact, although it is not pretty memory friendly(because they need to maintain about 2 trees).
So, I would suggest to choose option 1. Any thoughts? @overlookmotel @Boshen
It is true that transformer/minifier and linter/formatter have different needs. However in this case, I don't think it much affects which option we go for.
For example, in linter the numeric_separators_style rule does need the raw code string. But it can equally easily get that from a raw
field, or a raw()
method. So either option works for linter. Ditto the formatter, I think.
The raw()
method is simple and cheap, so personally I still prefer it.
@Boshen Do you have any opinion on which option? If we can agree, perhaps @ShuiRuTian is willing to work on it?
Boshen prefers raw: Option<&'a str>
.
raw
field to StringLiteral
.Still to do: Change raw
fields on other literal types from Atom<'a>
to Option<Atom<'a>>
.
@ShuiRuTian I wonder if you'd like to help out with this? Changing the raw
fields to Option
s I don't think will cause same problems as you hit before. In linter the raw
fields can just be unwrapped, as AST is always straight from the parser, so they're guaranteed to be Some
.
@overlookmotel I would love to, but @Boshen already have a PR #7393
Not sure what I need to do now.
Thanks for your willingness!
Boshen's PR will add raw
field to StringLiteral
. The remaining task is to make the raw
fields on all the other literal types optional.
e.g. NumericLiteral
's raw
field is currently &'a str
but should be Option<Atom<'a>>
.
FYI, Atom<'a>
and &'a str
are basically the same - Atom
is just a wrapper around &str
:
Boshen said he ran into a problem using raw: Option<&'a str>
on StringLiteral
because ast_tools
didn't understand that syntax, so let's go with raw: Option<Atom<'a>>
to avoid that problem.
Does that make more sense?
The problem
7211 brought up a problem. Most of our literal types (
NumericLiteral
etc) haveraw
field.This makes sense when the AST has come from the parser, but little sense when the node is generated (e.g. in transformer), because the node has no "raw" representation.
Having to provide
"0"
here feels like a hack to me, for example:https://github.com/oxc-project/oxc/blob/44375a5662ee3e6451f1a5335c75d6379d1878a6/crates/oxc_ast/src/ast_builder_impl.rs#L209-L211
The exception is
StringLiteral
which doesn't have araw
field. This feels like an omission - if the other literal types have araw
field, it should have one too. But adding one would be a pain, as there are a lot of places we generate strings in transformer etc.Prior art
Babel's types have the
raw
field underextra
which is optional:StringLiteral BaseNode
Acorn's types also have the
raw
field as optional:source
Possible solutions
I can see 2 options:
Make
raw
fields optionalraw
field onNumericLiteral
etc toOption<&'a str>
.raw: Option<&'a str>
field toStringLiteral
.Remove all
raw
fieldsWhere an AST node has a non-empty span, the raw value can be obtained by slicing source text. So remove all the
raw
fields, and use methods to get raw value where required:(as suggested in #5522)
Which?
Personally I prefer the 2nd. I don't think the
raw
fields are used much, so they're bloating the AST for little value. The methods to get raw value fromSpan
are pretty trivial and cheap.