oxc-project / backlog

backlog for collborators only
1 stars 0 forks source link

Better way to replace Statements + make `AstBuilder::move_*` cheaper #104

Open overlookmotel opened 2 months ago

overlookmotel commented 2 months ago

Originally opened as https://github.com/oxc-project/oxc/issues/5359. I tried to transfer that issue to backlog repo, but it's not working for some reason, so copying the content into this new issue instead.

I said:


I wonder if we should add Expression::None and Statement::None variants to AST?

Use cases

1. Removing Statements

Removing a Statement from a Vec<Statement> (as various transforms do) would be very easy and would not require "shuffling up" the Vec. Just replace the statement with Statement::None.

Make AstBuilder::move_expression cheaper

Currently move_expression substitutes a dummy Box<NullExpression> into the AST. This involves allocating space for it in the arena and writing it.

In contrast, Expression::None would be an enum variant with no "payload" and so would not require allocation. Probably then the compiler will be able to see code like this:

let owned_expr = move_expression(expr);
*expr = transform(owned_expr);

fn move_expression(expr: &mut Expression) {
    std::mem::replace(expr, Expression::None)
}

and understand that it can skip writing Expression::None completely, because it's pointless - it's overwritten again shortly after. So likely it'll be as efficient as AstBuilder::copy, while being completely safe.

Right now it can't do that, because move_expression currently has an observable side effect of advancing the bump allocator's pointer.

Codegen

Codegen would need to skip over Statement::None as if its not there.

It would panic on Expression::None. If any transforms are malfunctioning and leaving Expression::Nones in the AST, we'd discover that very quickly as we'd get panics in CI.

Alternatives

I've tried to dream up other APIs to replace move_expression. e.g. some kind of AstBuilder::replace_expression which uses unsafe code to avoid writing anything to the AST until it gets the replacement to put back in, and uses the type system to ensure you can't commit UB. But it seems pretty intractable to me. Simple cases like a one-for-one replacement are easy enough, but once you get into pulling out child nodes, combining them in a new struct, and pushing that back into the AST (which transformer does a lot), it gets... hard.

So, while I'm not that keen on adding "fake" types into the AST, I can't see a more viable solution to alleviating these pain points at present.

overlookmotel commented 2 months ago

Boshen replied:

These will propagate confusion to all other tools.

I think what I intended with "replace with null" and empty statements was that we can do a quick removal at the end instead of constantly shuffling the statements array.

It would be a none issue once this is coupled with the RemoveSyntax pass in the minifier https://github.com/oxc-project/oxc/blob/main/crates/oxc_minifier/src/ast_passes/remove_syntax.rs.

overlookmotel commented 2 months ago

I replied:

Yes, I can see there are downsides to this.

I will try again to come up with a replace API.

We can also solve the problems to a degree by other means:

Removing statements

Tricky, but probably doable, and would have other benefits (lower memory usage, ability to keep AST and scopes tree in sync at all times in transformer, ability to also insert statements from a child visitor).

Make move_* cheaper

Either:

  1. Remove Box from Expression::NullLiteral and Statement::EmptyStatement
enum Expression<'a> {
    // Not `EmptyStatement(Box<'a, EmptyStatement>)`
    EmptyStatement(EmptyStatement),
    // ...
}
  1. Replace Box with ImmutableBox in Expression::NullLiteral and Statement::EmptyStatement
enum Expression<'a> {
    // Not `EmptyStatement(Box<'a, EmptyStatement>)`
    EmptyStatement(ImmutableBox<'a, EmptyStatement>),
    // ...
}

ImmutableBox would be like Box, except with no ability to get a &mut ref to its contents.

So parser can create an EmptyStatement with specified span, but thereafter the span cannot be changed. I think that's fine - Spans refer to position in source text, and source text is immutable, so Spans should never be changed anyway.

The advantage of ImmutableBox is that (unlike Box), you can have multiple ImmutableBoxes pointing to same data, and ImmutableBox can be Copy. i.e. Box<T> is like &mut T, ImmutableBox is like &T.

So then:

static DUMMY_SPAN: Span = Span { start: 0, end: 0 };
static DUMMY_EMPTY_STATEMENT: EmptyStatement = EmptyStatement { span: DUMMY_SPAN };
static DUMMY_BOXED_EMPTY_STATEMENT: ImmutableBox<'static, EmptyStatement> = ImmutableBox::new(&DUMMY_EMPTY_STATEMENT);

fn move_statement<'a>(stmt: &mut Statement<'a>) -> Statement<'a> {
    // No allocations!
    std::mem::replace( Statement::EmptyStatement(DUMMY_BOXED_EMPTY_STATEMENT) )
}

The advantages of ImmutableBox over the no-box option are:

This would work for Expression and Statement as we have simple replacements. But not for e.g. AssignmentTarget and Declaration where there are no such simple replacements which only contain a Span.

replace API

Maybe can come up with something good in the end for this. Best I managed so far is:

impl<'a> AstBuilder<'a> {
    pub fn replace<T, R>(&self, value: &mut T, mut replacer: R)
    where R: FnMut(T) -> T,
    {
        let ptr = value as *mut T;
        let owned = unsafe { ptr.read() };
        let replacement = replacer(owned);
        unsafe { ptr.write(replacement) };
    }
}
let expr: &mut Expression = get_mut_expr_somehow();
ast.replace(expr, |owned_expr: Expression| {
    let replacement_expr: Expression = transform_somehow(owned_expr);
    replacement_expr
});

This is not ideal for 2 reasons:

NB: Article which is relevant to replacing nodes: Re: replacing nodes: https://smallcultfollowing.com/babysteps/blog/2018/11/10/after-nll-moving-from-borrowed-data-and-the-sentinel-pattern/