Add `gc_arena::reroot!` macro to allow accessing `GcCell` mutably

Aaron1011 commented 1 year ago

Using this macro provides mutable access to the contents of a GcCell, but with a fresh 'a lifetime distinct from the original 'gc lifetime. This prevents moving or allocating pointers with the 'gc lifetime in or out of the GcCell contents. As a result, previously-unreachable data cannot become reachable as a result of this macro, making it sound to avoid a write barrier.

One use case for this macro is setter methods from non-GC data, which will no longer need to take in a MutationContext.

kyren commented 1 year ago

Okay, I haven't thought about this in a few weeks and this is hurting my brain so I'm just going to ask instead, forgive me for asking a potentially stupid question...

Is there a sound way to make this a regular method on GcCell and use the Rootable trait instead? I assume there's not but I wanted to ask to make sure.. the associated type has to live for the 'a specified in the Rootable trait, right? Can you spell it out for me why that doesn't work, if I step away from this for more than a week or so it all falls out of my head again 😭

Oh, also this is a really good idea btw!

Aaron1011 commented 1 year ago

@kyren With Rootable, you could use a different lifetime in the associated type:

struct MyData<'gc>(Gc<'gc, String>) ;

impl<'a ,'gc> Rootable<'a> for MyData<'gc> {
    type Root = MyData<'gc>
}

If reroot then used <MyData<'gc> as Rootable<'a>>::Root, it would end up with MyData<' gc> instead of MyData<'a>. This would be unsound, since you have access to the original ' gc lifetime within the closure, which would allow you to adopt an existing pointer and store it in MyData from within the closure. We have the same issue if we use a GAT.

The problem is that soundness requires that we replace all 'gc lifetimes in the root type with a fresh 'a lifetime. However, a projection is not guaranteed to accomplish this, since the impl of Rootable<'a> might not use the ' a lifetime parameter on the trait. As far as I know, there's no way to express 'this type does not mention a given lifetime` as a bound.

We could make the trait unsafe, or make a new unsafe trait for just for reroot. However, I think it's better to allow users to avoid thinking about subtle lifetime issues, and just have a macro that causes a compilation error if you try to use it incorrectly.

Side note: It's sound to use Rootable with Arena despite the lifetime trick mentioned above. When we call Arena.mutate, it's fine if the type has other lifetimes - the only lifetime that interacts with garbage collection is the one used with MutationContext. If the 'gc lifetime isn't used at all, then you'll just be unable to allocate or modify any garbage-collected types, which is the right behavior. Even if you're nested inside another a different arenas mutate call, each mutate call gives you a unique distinct lifetime, so you cannot copy data from one arena into another.

In reroot, we need to prevent you from using data from the same arena, which means that your root type must have all uses of its original 'gc erased to a new lifetime.

kyren commented 1 year ago

Side note: It's sound to use Rootable with Arena despite the lifetime trick mentioned above. When we call Arena.mutate, it's fine if the type has other lifetimes - the only lifetime that interacts with garbage collection is the one used with MutationContext. If the 'gc lifetime isn't used at all, then you'll just be unable to allocate or modify any garbage-collected types, which is the right behavior. Even if you're nested inside another a different arenas mutate call, each mutate call gives you a unique distinct lifetime, so you cannot copy data from one arena into another.

Yeah this I knew, but yeah I see what you mean, thank you for refreshing my memory!

This is mostly academic, but if the Rootable trait was a GAT instead, could you do something like this without requiring a macro by requiring that <T as Rootable>::Root<'static>: 'static or something like that?

Aaron1011 commented 1 year ago

This is mostly academic, but if the Rootable trait was a GAT instead, could you do something like this without requiring a macro by requiring that ::Root<'static>: 'static or something like that?

That's an interesting idea - I did a quick test, and I think it would work.

So, this comes down to deciding which is more ergonomic:

A reroot! macro which doesn't require the user to define any new types
A reroot function that requires the user to define a new type (or pass in a unique identifier) to implement a GAT. Traits with GATs are not object-safe, so we can't use the Rootable! macro trick to use a trait object as the type.

kyren commented 1 year ago

I know this seems like a lot to avoid a macro, it's not that I'm trying to avoid the macro as much as it feels kinda bad to have the library work only sometimes with arbitrarily complex types. The macro approach falls over if whatever is held in the GcCell is generic, right? I like the Rootable! macro because it's not required, and if you reach a situation where it doesn't work you can just ignore it. Also wait, is it possible to forward macro parameters with this method without somehow allowing the user to smuggle the outer 'gc lifetime in?

I mean even if it only works for non-generic types it's still useful it just feels.. weird.

moulins commented 1 year ago

I can't believe I missed this for so long, but reroot! is actually fundamentally unsound, as it can be used to violate the tricolor invariant.

Consider the following:

#[derive(Copy, Collect)]
#[collect(no_drop)
struct Node<'gc>(GcCell<'gc, Option<Node<'gc>>);

fn oh_no<'gc>(mc: MutationContext<'gc, '_>, a: Node<'gc>) {
  // Let's assume we have the following object graph, with the specified GC colors:
  // a: Black ----> b: Gray ----> c: White

  // Let's modify `a` without a write barrier.
  reroot!(Node, a.0, |link| {
    // Grab a `c` reference with a compatible lifetime...
    let c: Node<'_> = link.unwrap().0.unwrap();
    // ...and make `a` point directly to it.
    *link = Some(c); 
  });

  // We now have a black object (`a`) pointing directly to a white object (`c`), violating
  // the tricolor invariant. Breaking the link between `b` and `c` will cause `c` to be
  // incorrectly freed during the next sweep phase.
  *b.0.write(mc) = None;
}

Fixing this would require reroot! to only rebrand the "first layer" of Gc references, but there is unfortunately no way to do this generically.

kyren / gc-arena

Add `gc_arena::reroot!` macro to allow accessing `GcCell` mutably #39