kyren / gc-arena

Incremental garbage collection from safe Rust
Creative Commons Zero v1.0 Universal
436 stars 36 forks source link

Add methods to `MarkedArena` to help fancier finalization algorithms #92

Closed kyren closed 6 months ago

kyren commented 6 months ago

Technically doesn't allow anything new, but prevents having to resort to unsafety or incurring re-marking of ephemeral state data.

Something like this is very useful for implementing things like ephemeron tables with convergence. For example, say you have a data structure that keeps track of live ephemeron tables within a scripting runtime...

Ephemeron tables (such as in Lua) are weak key, strong value tables with an extra property: that ephemeron entries are kept alive only if their keys are reachable from roots other than other unreachable ephemeron values. In other words, if an ephemeron key is reachable from roots other than ephemeron values, then the entry is kept alive, OR if the key is reachable from an ephemeron value that is now kept alive, the entry is kept alive. One way to implement this with gc-arena (and this is nearly identical to what PUC-Rio Lua does) is to "converge" ephemeron tables during finalization.

At some point in finalization, every known-to-be-live entry in any ephemeron table must be marked and then fully traced. HOWEVER, this cannot be done only once, because you can imagine an ephemeron table like this:

[
  (K1 (live), V1 -> K2),
  (K2 (dead?), V2 -> K3),
  (K3 (dead?), V3),
]

Suppose at the beginning of the ephemeron stage of finalization, only K1 is known to be live. The first step will thus keep the (K1, V1) entry alive, resurrecting V1. If we then clear every ephemeron table entry with dead keys, this will clear both (K2, V2) (K3, V3) entries of the table, which is incorrect. We can re-mark the arena after the first step, but then this will only mark the key K2 as live, we would still incorrectly delete the (K3, V3) entry.

The correct way to handle this is to do this process repeatedly, mark all known-to-be-live entries in ephemeron tables and keep track of any not-known-to-be-live entries in a separate data structure. Mark and fully trace the known-to-be-live entries, then go through the list of previously not-known-to-be-live entries and check if any new entries are now known to be live. If any are newly live, then remove them from the data structure and repeat the process again, until no new changes are detected. By doing this process, we account for all keys that are reachable through the values of reacahble ephemeron entries, and it must be done for all ephemeron tables at one time. This is what PUC-Rio Lua calls "ephemeron convergence".

This algorithm is currently possible to implement in gc-arena, but it is more difficult than it has to be. We need to keep track of potentially a lot of data during convergence, like a list of every ephemeron table with dead key / value pairs to know which tables to converge on the next step, or we may want to keep a list of all of the possible-to-be-marked pairs directly. Keeping track of this data within the root is probably much slower than we'd like, especially because every stage involves completing marking. This data is needed only as part of the process of finalization, and isn't logically a root.

A simpler solution than complicating the API in this way is to have a Finalization::mark_all method, however this has a big footgun. If ANY RefLock is held during the call, Finalization::mark_all will always panic. A safer API is the one provided here, in order to re-mark the object graph, mutation must pause, but we allow for some ephemeral state to persist between steps.

kyren commented 6 months ago

I think this is 99% covered by keeping a GcWeak pointer to state and I was dramatically overthinking this. This entire API surface basically avoids the overhead of a tiny amount of tracing work and maybe a pointer indirection.