Better way to replace Statements + make `AstBuilder::move_*` cheaper

overlookmotel commented 2 months ago

Originally opened as https://github.com/oxc-project/oxc/issues/5359. I tried to transfer that issue to backlog repo, but it's not working for some reason, so copying the content into this new issue instead.

I said:

I wonder if we should add Expression::None and Statement::None variants to AST?

Use cases

1. Removing `Statement`s

Removing a Statement from a Vec<Statement> (as various transforms do) would be very easy and would not require "shuffling up" the Vec. Just replace the statement with Statement::None.

Make `AstBuilder::move_expression` cheaper

Currently move_expression substitutes a dummy Box<NullExpression> into the AST. This involves allocating space for it in the arena and writing it.

In contrast, Expression::None would be an enum variant with no "payload" and so would not require allocation. Probably then the compiler will be able to see code like this:

let owned_expr = move_expression(expr);
*expr = transform(owned_expr);

fn move_expression(expr: &mut Expression) {
    std::mem::replace(expr, Expression::None)
}

and understand that it can skip writing Expression::None completely, because it's pointless - it's overwritten again shortly after. So likely it'll be as efficient as AstBuilder::copy, while being completely safe.

Right now it can't do that, because move_expression currently has an observable side effect of advancing the bump allocator's pointer.

Codegen

Codegen would need to skip over Statement::None as if its not there.

It would panic on Expression::None. If any transforms are malfunctioning and leaving Expression::Nones in the AST, we'd discover that very quickly as we'd get panics in CI.

Alternatives

I've tried to dream up other APIs to replace move_expression. e.g. some kind of AstBuilder::replace_expression which uses unsafe code to avoid writing anything to the AST until it gets the replacement to put back in, and uses the type system to ensure you can't commit UB. But it seems pretty intractable to me. Simple cases like a one-for-one replacement are easy enough, but once you get into pulling out child nodes, combining them in a new struct, and pushing that back into the AST (which transformer does a lot), it gets... hard.

So, while I'm not that keen on adding "fake" types into the AST, I can't see a more viable solution to alleviating these pain points at present.

overlookmotel commented 2 months ago

Boshen replied:

These will propagate confusion to all other tools.

I think what I intended with "replace with null" and empty statements was that we can do a quick removal at the end instead of constantly shuffling the statements array.

It would be a none issue once this is coupled with the RemoveSyntax pass in the minifier https://github.com/oxc-project/oxc/blob/main/crates/oxc_minifier/src/ast_passes/remove_syntax.rs.

overlookmotel commented 2 months ago

I replied:

Yes, I can see there are downsides to this.

I will try again to come up with a replace API.

We can also solve the problems to a degree by other means:

Removing statements

Replace Vec<Statement> with a chunked linked list (https://github.com/oxc-project/backlog/issues/35).
Support upwards mutation in Traverse.

Tricky, but probably doable, and would have other benefits (lower memory usage, ability to keep AST and scopes tree in sync at all times in transformer, ability to also insert statements from a child visitor).

Make `move_*` cheaper

Either:

Remove Box from Expression::NullLiteral and Statement::EmptyStatement

enum Expression<'a> {
    // Not `EmptyStatement(Box<'a, EmptyStatement>)`
    EmptyStatement(EmptyStatement),
    // ...
}

Replace Box with ImmutableBox in Expression::NullLiteral and Statement::EmptyStatement

enum Expression<'a> {
    // Not `EmptyStatement(Box<'a, EmptyStatement>)`
    EmptyStatement(ImmutableBox<'a, EmptyStatement>),
    // ...
}

ImmutableBox would be like Box, except with no ability to get a &mut ref to its contents.

So parser can create an EmptyStatement with specified span, but thereafter the span cannot be changed. I think that's fine - Spans refer to position in source text, and source text is immutable, so Spans should never be changed anyway.

The advantage of ImmutableBox is that (unlike Box), you can have multiple ImmutableBoxes pointing to same data, and ImmutableBox can be Copy. i.e. Box<T> is like &mut T, ImmutableBox is like &T.

So then:

static DUMMY_SPAN: Span = Span { start: 0, end: 0 };
static DUMMY_EMPTY_STATEMENT: EmptyStatement = EmptyStatement { span: DUMMY_SPAN };
static DUMMY_BOXED_EMPTY_STATEMENT: ImmutableBox<'static, EmptyStatement> = ImmutableBox::new(&DUMMY_EMPTY_STATEMENT);

fn move_statement<'a>(stmt: &mut Statement<'a>) -> Statement<'a> {
    // No allocations!
    std::mem::replace( Statement::EmptyStatement(DUMMY_BOXED_EMPTY_STATEMENT) )
}

The advantages of ImmutableBox over the no-box option are:

If EmptyStatement increases in size (e.g. when we add AstNodeId to all AST nodes), no-box version would increase size of Statement by 8 bytes. No good!
We probably want AstNodeId field to always be in same memory position in all AST types, so getting the AstNodeId of any Expression, Statement etc can be a very cheap branchless lookup. That relies on all Expression / Statement variants having identical layout - all being Box<T>.

This would work for Expression and Statement as we have simple replacements. But not for e.g. AssignmentTarget and Declaration where there are no such simple replacements which only contain a Span.

`replace` API

Maybe can come up with something good in the end for this. Best I managed so far is:

impl<'a> AstBuilder<'a> {
    pub fn replace<T, R>(&self, value: &mut T, mut replacer: R)
    where R: FnMut(T) -> T,
    {
        let ptr = value as *mut T;
        let owned = unsafe { ptr.read() };
        let replacement = replacer(owned);
        unsafe { ptr.write(replacement) };
    }
}

let expr: &mut Expression = get_mut_expr_somehow();
ast.replace(expr, |owned_expr: Expression| {
    let replacement_expr: Expression = transform_somehow(owned_expr);
    replacement_expr
});

This is not ideal for 2 reasons:

Probably unsound (but maybe it can be made sound).
It gives no ability to decide not to replace the node.

NB: Article which is relevant to replacing nodes: Re: replacing nodes: https://smallcultfollowing.com/babysteps/blog/2018/11/10/after-nll-moving-from-borrowed-data-and-the-sentinel-pattern/

oxc-project / backlog