rust-hosted-langs / book

Writing Interpreters in Rust: a Guide
https://rust-hosted-langs.github.io/book/
Creative Commons Attribution 4.0 International
479 stars 27 forks source link

Rooting objects #60

Open CeleritasCelery opened 7 months ago

CeleritasCelery commented 7 months ago

I was reading over the section about dereferencing safely and noticed that there was not any way to root objects. I don't know if this already planned to be covered in the section on garbage collection, but I think it is a really interesting problem that doesn't have a easy solution. It seems like the two approaches used for Rust thus far are either mutation sessions where you can only access heap objects in a closure or having some macro with drop guards. Each have their own trade-offs and I am curious what rust-hosted-langs recommends.

pliniker commented 7 months ago

In this project we're using mutation sessions. See https://github.com/rust-hosted-langs/book/blob/master/interpreter/src/memory.rs#L142 for the interface and usage. This is essentially just the interior mutability pattern, and thus as a pattern that Rust widely supports, has seemed to me the more likely to be sustainable in a larger codebase.

As for rooting, you're right, I haven't explicitly addressed this yet. I intend to in the WIP section on GC. In short, though, everything referenced by a Thread object should be considered as roots: https://github.com/rust-hosted-langs/book/blob/master/interpreter/src/vm.rs#L178

There are some holes in the safety of the mutation session implementation currently: I think at present you could instantiate two separate VMs each with their own heap, but they do not validate that a Thread and its roots are associated with a particular instance. That's a TODO.

If you see any other gaps/holes I would love to know! Thank you for your interest :)

CeleritasCelery commented 7 months ago

Thank you. Just looking over the code, and had a few questions:

Can roots be stored outside of Memory, or each time you open a mutating session do you need to reaccess your roots? If they are stored outside, how to find them during garbage collection?

With the Mutator trait you can only have a single run method with static input and output types. How would handle a struct that has many methods, like vec or hashmap? Would you have to define you input type as some sort of enum that tells the run method which operation to perform?

Looking at the repl it seems like you would be unable to call GC at all during the execution of a function (which could be arbitrarily long) because the whole thing is evaluated inside a mutation session. Is that the case?

What would prevent me from doing the following?

  1. inside a mutation session clone a CellPtr<T>
  2. change something so that the original T is not accessible (this is not a problem because GC can't run, so it is still safe to access)
  3. pass the cloned CellPtr<T> out of the mutation session via the Output type or Thread-local storage.
  4. run garbage collection (the original T is now collected and freed).
  5. pass the CellPtr<T> back into a mutation session and then convert it to a ScopedPtr via get.
  6. try to access T (triggering UB via UAF)
pliniker commented 1 week ago

Thank you for provoking me to write some things down that have only sat in my own head so far! 😁 Sorry for the months gone by, hopefully this is still interesting!

Can roots be stored outside of Memory, or each time you open a mutating session do you need to reaccess your roots? If they are stored outside, how to find them during garbage collection?

My goal, not yet fully thought through into implementation, is that roots would strictly reside only inside Memory. That said, I can foresee some kind of pinning mechanism to allow references to roots from outside of Memory. This mechanism would probably use reference counting and also prevent the GC from moving objects.

With the Mutator trait you can only have a single run method with static input and output types. How would handle a struct that has many methods, like vec or hashmap? Would you have to define you input type as some sort of enum that tells the run method which operation to perform?

I think you're asking how data structures would be shared between the language runtime and external Rust code? Which in some way is also asking what is the input/output interface between the two domains.

Since native Rust and the hosted language are, in this project, a lot more independent than I think most Rust language implementations, it would be up to the user to implement aa translation layer. The language Memory is not intended to be shared between the hosted language and external Rust data structures (Vec, HashMap etc), so some kind of serialization/deserialization protocol would be needed to make an interface for more complex data structures. The only thing implemented at this time is a basic repl, which provides simply for string input and printing to screen!

Looking at the repl it seems like you would be unable to call GC at all during the execution of a function (which could be arbitrarily long) because the whole thing is evaluated inside a mutation session. Is that the case?

This is a work in progress on my own code branch. The code as-is hasn't been refactored towards answering this situation.

My thought thus far is that if GC is needed, the allocator would signal as such by returning a AllocPressure enum variant. This would then pause bytecode evaluation and tell the Mutator that a GC iteration is needed.

What would prevent me from doing the following?

  1. inside a mutation session clone a CellPtr
  2. change something so that the original T is not accessible (this is not a problem because GC can't run, so it is still safe to access)
  3. pass the cloned CellPtr out of the mutation session via the Output type or Thread-local storage.
  4. run garbage collection (the original T is now collected and freed).
  5. pass the CellPtr back into a mutation session and then convert it to a ScopedPtr via get.
  6. try to access T (triggering UB via UAF)

Right, another form of causing a root to escape.

I am hoping that the answer to be that generalized negative impls will solve this in a future stable Rust release.

If Mutator::Input and Mutator::Output cannot implement or contain anything that implements, a trait, say, Pointer, and if CellPtr implements Pointer, then perhaps the hole can be closed?

I am not sure if this would work or if I have understood all the implications of generalized negative impls.

What do you think?