oxc-project / oxc

⚓ A collection of JavaScript tools written in Rust.
https://oxc.rs
MIT License
12.52k stars 458 forks source link

Make `raw` field on literal AST types optional, or remove them #7254

Open overlookmotel opened 2 weeks ago

overlookmotel commented 2 weeks ago

The problem

7211 brought up a problem. Most of our literal types (NumericLiteral etc) have raw field.

This makes sense when the AST has come from the parser, but little sense when the node is generated (e.g. in transformer), because the node has no "raw" representation.

Having to provide "0" here feels like a hack to me, for example:

https://github.com/oxc-project/oxc/blob/44375a5662ee3e6451f1a5335c75d6379d1878a6/crates/oxc_ast/src/ast_builder_impl.rs#L209-L211

The exception is StringLiteral which doesn't have a raw field. This feels like an omission - if the other literal types have a raw field, it should have one too. But adding one would be a pain, as there are a lot of places we generate strings in transformer etc.

Prior art

Babel's types have the raw field under extra which is optional:

export interface StringLiteral extends BaseNode {
  type: "StringLiteral";
  value: string;
}

interface BaseNode {
  type: Node["type"];
  // ...
  extra?: Record<string, unknown>;
}

StringLiteral BaseNode

Acorn's types also have the raw field as optional:

export interface Literal extends Node {
  type: "Literal"
  value?: string | boolean | null | number | RegExp | bigint
  raw?: string
  regex?: {
    pattern: string
    flags: string
  }
  bigint?: string
}

source

Possible solutions

I can see 2 options:

Make raw fields optional

Remove all raw fields

Where an AST node has a non-empty span, the raw value can be obtained by slicing source text. So remove all the raw fields, and use methods to get raw value where required:

impl NumericLiteral {
    pub fn raw<'a>(&self, source_text: &'a str) -> Option<&'a str> {
        if self.span.start == 0 && self.span.end == 0 {
            None
        } else {
            Some(&source_text[self.span.start..self.span.end])
        }
    }
}

(as suggested in #5522)

Which?

Personally I prefer the 2nd. I don't think the raw fields are used much, so they're bloating the AST for little value. The methods to get raw value from Span are pretty trivial and cheap.

ShuiRuTian commented 1 week ago

I did a try to remove raw fields, but there are some issues...

I think the core problem is the mix purpose of AST, different requirement leads to different design.

For example, for compiling, we do not need any origin data, so the AST could be abstract.

But for linter or formatter, some abilities should not be enabled by default. Like, for this, they might not want 1_0_2_2 be converted to 1_022, at least there should be an option to control it.

Roslyn(Biome and RA too, they refers Roslyn) keeps a lossless syntax tree for this purpose. This design makes their AST great for interreact, although it is not pretty memory friendly(because they need to maintain about 2 trees).

So, I would suggest to choose option 1. Any thoughts? @overlookmotel @Boshen

overlookmotel commented 1 week ago

It is true that transformer/minifier and linter/formatter have different needs. However in this case, I don't think it much affects which option we go for.

For example, in linter the numeric_separators_style rule does need the raw code string. But it can equally easily get that from a raw field, or a raw() method. So either option works for linter. Ditto the formatter, I think.

The raw() method is simple and cheap, so personally I still prefer it.

overlookmotel commented 1 week ago

@Boshen Do you have any opinion on which option? If we can agree, perhaps @ShuiRuTian is willing to work on it?

overlookmotel commented 5 days ago

Boshen prefers raw: Option<&'a str>.

overlookmotel commented 5 days ago

7393 adds raw field to StringLiteral.

Still to do: Change raw fields on other literal types from Atom<'a> to Option<Atom<'a>>.

overlookmotel commented 1 day ago

@ShuiRuTian I wonder if you'd like to help out with this? Changing the raw fields to Options I don't think will cause same problems as you hit before. In linter the raw fields can just be unwrapped, as AST is always straight from the parser, so they're guaranteed to be Some.

ShuiRuTian commented 4 hours ago

@overlookmotel I would love to, but @Boshen already have a PR #7393

Not sure what I need to do now.

overlookmotel commented 3 hours ago

Thanks for your willingness!

Boshen's PR will add raw field to StringLiteral. The remaining task is to make the raw fields on all the other literal types optional.

e.g. NumericLiteral's raw field is currently &'a str but should be Option<Atom<'a>>.

FYI, Atom<'a> and &'a str are basically the same - Atom is just a wrapper around &str:

https://github.com/oxc-project/oxc/blob/199076bd14faefeff42ae263217e165042b2fad1/crates/oxc_span/src/atom.rs#L20

Boshen said he ran into a problem using raw: Option<&'a str> on StringLiteral because ast_tools didn't understand that syntax, so let's go with raw: Option<Atom<'a>> to avoid that problem.

Does that make more sense?