Open overlookmotel opened 2 months ago
Boshen replied:
These will propagate confusion to all other tools.
I think what I intended with "replace with null
" and empty statements was that we can do a quick removal at the end instead of constantly shuffling the statements array.
It would be a none issue once this is coupled with the RemoveSyntax
pass in the minifier https://github.com/oxc-project/oxc/blob/main/crates/oxc_minifier/src/ast_passes/remove_syntax.rs
.
I replied:
Yes, I can see there are downsides to this.
I will try again to come up with a replace
API.
We can also solve the problems to a degree by other means:
Vec<Statement>
with a chunked linked list (https://github.com/oxc-project/backlog/issues/35).Traverse
.Tricky, but probably doable, and would have other benefits (lower memory usage, ability to keep AST and scopes tree in sync at all times in transformer, ability to also insert statements from a child visitor).
move_*
cheaperEither:
Box
from Expression::NullLiteral
and Statement::EmptyStatement
enum Expression<'a> {
// Not `EmptyStatement(Box<'a, EmptyStatement>)`
EmptyStatement(EmptyStatement),
// ...
}
Box
with ImmutableBox
in Expression::NullLiteral
and Statement::EmptyStatement
enum Expression<'a> {
// Not `EmptyStatement(Box<'a, EmptyStatement>)`
EmptyStatement(ImmutableBox<'a, EmptyStatement>),
// ...
}
ImmutableBox
would be like Box
, except with no ability to get a &mut
ref to its contents.
So parser can create an EmptyStatement
with specified span
, but thereafter the span
cannot be changed. I think that's fine - Span
s refer to position in source text, and source text is immutable, so Span
s should never be changed anyway.
The advantage of ImmutableBox
is that (unlike Box
), you can have multiple ImmutableBox
es pointing to same data, and ImmutableBox
can be Copy
. i.e. Box<T>
is like &mut T
, ImmutableBox
is like &T
.
So then:
static DUMMY_SPAN: Span = Span { start: 0, end: 0 };
static DUMMY_EMPTY_STATEMENT: EmptyStatement = EmptyStatement { span: DUMMY_SPAN };
static DUMMY_BOXED_EMPTY_STATEMENT: ImmutableBox<'static, EmptyStatement> = ImmutableBox::new(&DUMMY_EMPTY_STATEMENT);
fn move_statement<'a>(stmt: &mut Statement<'a>) -> Statement<'a> {
// No allocations!
std::mem::replace( Statement::EmptyStatement(DUMMY_BOXED_EMPTY_STATEMENT) )
}
The advantages of ImmutableBox
over the no-box option are:
EmptyStatement
increases in size (e.g. when we add AstNodeId
to all AST nodes), no-box version would increase size of Statement
by 8 bytes. No good!AstNodeId
field to always be in same memory position in all AST types, so getting the AstNodeId
of any Expression
, Statement
etc can be a very cheap branchless lookup. That relies on all Expression
/ Statement
variants having identical layout - all being Box<T>
.This would work for Expression
and Statement
as we have simple replacements. But not for e.g. AssignmentTarget
and Declaration
where there are no such simple replacements which only contain a Span
.
replace
APIMaybe can come up with something good in the end for this. Best I managed so far is:
impl<'a> AstBuilder<'a> {
pub fn replace<T, R>(&self, value: &mut T, mut replacer: R)
where R: FnMut(T) -> T,
{
let ptr = value as *mut T;
let owned = unsafe { ptr.read() };
let replacement = replacer(owned);
unsafe { ptr.write(replacement) };
}
}
let expr: &mut Expression = get_mut_expr_somehow();
ast.replace(expr, |owned_expr: Expression| {
let replacement_expr: Expression = transform_somehow(owned_expr);
replacement_expr
});
This is not ideal for 2 reasons:
NB: Article which is relevant to replacing nodes: Re: replacing nodes: https://smallcultfollowing.com/babysteps/blog/2018/11/10/after-nll-moving-from-borrowed-data-and-the-sentinel-pattern/
Originally opened as https://github.com/oxc-project/oxc/issues/5359. I tried to transfer that issue to backlog repo, but it's not working for some reason, so copying the content into this new issue instead.
I said:
I wonder if we should add
Expression::None
andStatement::None
variants to AST?Use cases
1. Removing
Statement
sRemoving a
Statement
from aVec<Statement>
(as various transforms do) would be very easy and would not require "shuffling up" theVec
. Just replace the statement withStatement::None
.Make
AstBuilder::move_expression
cheaperCurrently
move_expression
substitutes a dummyBox<NullExpression>
into the AST. This involves allocating space for it in the arena and writing it.In contrast,
Expression::None
would be an enum variant with no "payload" and so would not require allocation. Probably then the compiler will be able to see code like this:and understand that it can skip writing
Expression::None
completely, because it's pointless - it's overwritten again shortly after. So likely it'll be as efficient asAstBuilder::copy
, while being completely safe.Right now it can't do that, because
move_expression
currently has an observable side effect of advancing the bump allocator's pointer.Codegen
Codegen would need to skip over
Statement::None
as if its not there.It would panic on
Expression::None
. If any transforms are malfunctioning and leavingExpression::None
s in the AST, we'd discover that very quickly as we'd get panics in CI.Alternatives
I've tried to dream up other APIs to replace
move_expression
. e.g. some kind ofAstBuilder::replace_expression
which uses unsafe code to avoid writing anything to the AST until it gets the replacement to put back in, and uses the type system to ensure you can't commit UB. But it seems pretty intractable to me. Simple cases like a one-for-one replacement are easy enough, but once you get into pulling out child nodes, combining them in a new struct, and pushing that back into the AST (which transformer does a lot), it gets... hard.So, while I'm not that keen on adding "fake" types into the AST, I can't see a more viable solution to alleviating these pain points at present.