oxc-project / oxc

⚓ A collection of JavaScript tools written in Rust.
https://oxc.rs
MIT License
10.53k stars 387 forks source link

`CloneIn` trait for AST nodes #4284

Closed overlookmotel closed 1 month ago

overlookmotel commented 2 months ago

We need to be able to clone AST nodes. But we cannot implement Clone on most AST types because oxc_allocator::Box is not Clone.

There is a good reason for that - Box stores the value it owns in arena allocator, so there is no Box::new method, only Box::new_in which takes an &Allocator param for the arena to allocate it in.

I propose a new trait CloneIn:

trait CloneIn<'alloc> {
    fn clone_in(&self, allocator: &'alloc Allocator) -> Self;
}

Impl for oxc_allocator::Box:

impl<'alloc, T> CloneIn<'alloc> for Box<'alloc, T>
where
    T: CloneIn<'alloc>,
{
    fn clone_in(&self, allocator: &'alloc Allocator) -> Self {
        Box::new_in(self.deref().clone_in(allocator), allocator)
    }
}

(I'm actually not sure if I have this right - maybe 'alloc lifetime should be on fn clone_in<'alloc> not CloneIn<'alloc>)

Same as Clone, CloneIn can be implemented on any type where all its fields/variants are also CloneIn.

We could make a derive macro for CloneIn so can just #[derive(CloneIn)] on AST types.

NB: clone_in should always clone into the arena of the provided allocator, which may be different from the arena that self is in. i.e. you can use clone_in to copy objects from one arena to another.

Boshen commented 1 month ago

This is required by Rolldown app mode.

overlookmotel commented 1 month ago

This is required by Rolldown app mode.

Urgently required? Or just at some point?

overlookmotel commented 1 month ago

I think maybe this is the correct impl so return value has lifetime of the new allocator, not the old one:

trait CloneIn {
    fn clone_in<'alloc>(&self, allocator: &'alloc Allocator) -> Self<'alloc>;
}

impl<'old_alloc, T> CloneIn for Box<'old_alloc, T>
where
    T: CloneIn,
{
    fn clone_in<'new_alloc>(&self, allocator: &'new_alloc Allocator) -> Self<'new_alloc> {
        Self::new_in(self.deref().clone_in(allocator), allocator)
    }
}
rzvxa commented 1 month ago

I think maybe this is the correct impl so return value has lifetime of the new allocator, not the old one:

trait CloneIn {
    fn clone_in<'alloc>(&self, allocator: &'alloc Allocator) -> Self<'alloc>;
}

This isn't a valid syntax

overlookmotel commented 1 month ago

This isn't a valid syntax

Yes, maybe not. But what is? What I posted originally isn't right either.

rzvxa commented 1 month ago

Just thinking out loud, Maybe we can use a marker trait to ensure the lifetime:

trait Allocated<'alloc> { /* noop */ }

trait CloneIn<'alloc>: Allocated<'alloc> {
    fn clone_in<'old>(&'old self, allocator: &'alloc Allocator) -> Self;
}

But I think using this is enough:

trait CloneIn<'alloc> {
    fn clone_in<'old_alloc>(&'old_alloc self, allocator: &'alloc Allocator) -> Self;
}

If the implemented lifetime differs from the allocator you'd get a different Self than what you were expecting. So it wouldn't compile. Intuitively people would implement the right return lifetime since you only have access to the new allocator here. But if the user implements it differently(you may even say wrong) it is on them. But the compiler would stop you from shooting yourself in the foot and I think people would figure it out relatively quickly.

rzvxa commented 1 month ago

The marker I described in my previous comment won't work either. It has the same allocator lifetime on both ends. I think we should just rely on the end user to implement it correctly!

Maybe we can use this?

/// SAFETY: Cloned type should exist in the `'alloc`
unsafe trait CloneIn<'alloc> {
    fn clone_in(&self, allocator: &'alloc Allocator) -> Self;
}
overlookmotel commented 1 month ago
trait CloneIn<'alloc> {
    fn clone_in<'old_alloc>(&'old_alloc self, allocator: &'alloc Allocator) -> Self;
}

I don't think this is right either. 'old_alloc shouldn't be on &self, but on the lifetime param of Self. i.e. without a trait:

impl<'old_alloc> Foo<'old_alloc> {
    fn clone_in<'new_alloc>(&self, allocator: &'new_alloc Allocator) -> Foo<'new_alloc> { /* ... */ }
}

But how to turn it into a trait?

Maybe we can use this?

Does unsafe save us here? Anyway, it must be possible to do this in safe Rust. I just don't know how. I think will need to ask for help on the Rustlang forum.

NB: I'm imagining codegen-ing the impls for all AST types.

rzvxa commented 1 month ago

I don't think this is right either. 'old_alloc shouldn't be on &self, but on the lifetime param of Self. i.e. without a trait:

Yes, I noticed my mistake a short while after posting it.

Does unsafe save us here?

No, But If there is no way to describe what we want in the type system then that's a reasonable approach. For derive and codegen we can ensure the safety, unsafe here is to force the end users to be careful with their implementation and read the safety docs first.

I think will need to ask for help on the Rustlang forum.

Let's give it a shot.

overlookmotel commented 1 month ago

If necessary, we could codegen clone_in methods for all AST node types, without a CloneIn trait, and no #[derive] macro. But I'm sure there must be a way to do this with a trait, and I'd like to know what it is.

rzvxa commented 1 month ago

I'd very much like to learn how to describe such a thing in the type system as well. So let's give it a shot.

we could codegen clone_in methods for all AST node types

If the point is to prevent people from implementing CloneIn themselves, I think it would be better to make it internal(by trait bounds) while still exposing the CloneIn trait for use.

overlookmotel commented 1 month ago

Got it!

trait CloneIn<'new_alloc> {
    type Cloned;

    fn clone_in(&self, allocator: &'new_alloc Allocator) -> Self::Cloned;
}

impl<'old_alloc, 'new_alloc> CloneIn<'new_alloc> for UnaryExpression<'old_alloc> {
    type Cloned = UnaryExpression<'new_alloc>;

    fn clone_in(&self, allocator: &'new_alloc Allocator) -> Self::Cloned {
        UnaryExpression {
            span: self.span.clone_in(allocator),
            operator: self.operator.clone_in(allocator),
            argument: self.argument.clone_in(allocator),
        }
    }
}

Thanks to: https://github.com/rust-lang/wg-allocators/issues/15#issuecomment-542885461

rzvxa commented 1 month ago

Nice, It doesn't necessarily force the return type to be a variant of Self but at least it works as intended.

overlookmotel commented 1 month ago

It doesn't necessarily force the return type to be a variant of Self

Yes, that's a shortcoming. Probably there's some wacky type system trick to enforce that, but I don't think it's worth it. You can't prevent someone making Expression::clone_in return an instance of MyTrousers instead of Expression, but then you also can't statically prevent them writing a method body which is loaded full of bugs and panics.

Anyway, we'll codegen the impls for AST nodes, so there will be no MyTrousers mistakes. 😆

rzvxa commented 1 month ago

I genuinely burst out laughing when I saw MyTrousers😆

hyf0 commented 1 month ago

What's the blocker here? It seems the final form has been shown and there are no technical issues.

hyf0 commented 1 month ago

Would it be more performant if there something like size_hint in CloneIn trait to infer allocator?

overlookmotel commented 1 month ago

What's the blocker here? It seems the final form has been shown and there are no technical issues.

No blocker. We've just been working on other things. Is your need for it urgent?

Would it be more performant if there something like size_hint in CloneIn trait to infer allocator?

Sorry I don't get you. How do you mean?

hyf0 commented 1 month ago

No blocker. We've just been working on other things. Is your need for it urgent?

Not sure how define urgent. But people are writing code like this

https://github.com/rolldown/rolldown/blob/b7dd0c99650d8770a990efa6f86daaa5bc00d469/crates/rolldown_plugin_dynamic_import_vars/src/clone_expr.rs

Sorry I don't get you. How do you mean?

trait CloneIn<'new_alloc> {
    type Cloned;

    fn clone_in(&self, allocator: &'new_alloc Allocator) -> Self::Cloned;

    fn size_hint(&self) -> usize {
       0
    }
}

impl<'old_alloc, 'new_alloc> CloneIn<'new_alloc> for UnaryExpression<'old_alloc> {
    type Cloned = UnaryExpression<'new_alloc>;

    fn clone_in(&self, allocator: &'new_alloc Allocator) -> Self::Cloned {
        // sadly bumpalo doesn't have this API.
        allocator.reserve_additional_bytes(self::size_hint());
        UnaryExpression {
            span: self.span.clone_in(allocator),
            operator: self.operator.clone_in(allocator),
            argument: self.argument.clone_in(allocator),
        }
    }
}
overlookmotel commented 1 month ago

Not sure how define urgent. But people are writing code like this https://github.com/rolldown/rolldown/blob/b7dd0c99650d8770a990efa6f86daaa5bc00d469/crates/rolldown_plugin_dynamic_import_vars/src/clone_expr.rs

Oh my GOD! That's horrendous. Also, it screws up the spans.

fn size_hint(&self) -> usize { 0 }

Ah now I understand. Yes, that would be good. But problem is how do you work out what the size is? In the case of UnaryExpression, argument is an Expression which could be 1 or could be (() => { let loadsOfComplexLogic = foo(bar, qux); return !loadsOfComplexLogic; })() - basically any size.

Only way to work out the size accurately would be to visit all the children of the AST node. I imagine performing that extra visit would cost more than what you save by having the size hint.

rzvxa commented 1 month ago

@hyf0 I saw you mentioned this issue, Just want to let you know I've picked this up and it should be expected in the next release.

overlookmotel commented 1 month ago

Implemented in #4732. Very nice work from the man like @rzvxa.

Dunqing commented 4 weeks ago

BindingIdentifier, IdentifierReference, and other Identifier have been derived Clone trait. So I want to ask should we use clone_in or clone for that? If we should use clone_in everywhere, wouldn't we have to remove Clone?