mmtk / mmtk-core

Memory Management ToolKit
https://www.mmtk.io
Other
371 stars 67 forks source link

Precise heap traversal API #1186

Open qinsoon opened 1 month ago

qinsoon commented 1 month ago

https://github.com/mmtk/mmtk-core/pull/1174 introduced an API that allows heap traversal. The API will enumerate objects based on the VO bit at the time of enumeration. So if there is GC or allocation going on at the same time, we may see objects being reclaimed, or missing objects. We may want a version of the API that blocks allocation and GC so we can have a consistent heap snapshot.

wks commented 3 weeks ago

The heap traversal API introduced in #1174 requires that the invocation of MMTK::enumerate_objects does not happen concurrently with allocation or GC, or it has undefined behavior. This means if the VM binding uses that API, the VM binding must block allocation and GC, and, as a result, the heap traversal must be consistent.

It is arguable that since mmtk-core can call Collection::stop_all_mutators and Collection::resume_mutators, mmtk-core can provide an API that automatically blocks all mutator activities and enumerate objects at a time when it is guaranteed not to race with allocation or GC. While the API can be convenient to use, it increases the complexity of the overall MMTk-binding API in some ways:

And the caller of the "precise" traversal API (say MMTK::enumerate_objects_precise), still needs to ensure the current thread is in GC safepoint (i.e. has yielded). This is the same as using the API in #1174 which has undefined behavior if there are races.

Implementation-wise, mmtk-core will need another "WorkerGoal::HeapTraversal" as the reason to stop the world, and we will need something like schedule_collection to schedule the invocation of Collection::stop_all_mutators and Collection::resume_mutators (which are supposed to be called by GC workers) in the appropriate work packets (which are currently all designed for GC, not heap traversal). On the contrary, if the VM already has a nice safepoint synchronization mechanism, it can stop all mutators and invoke MMTK::enumerate_objects quite trivially. For example, in OpenJDK, it only needs to make a new VMOperation and let the "VM thread" run it. The VM will automatically stop mutators at safepoints and let the VM thread run MMTK::enumerate_objects.

So I don't see the advantage of letting mmtk-core provide a heap-traversal API that stops/resumes mutators for the binding. The binding can stop/resume mutators (if it has multiple mutators at all) by itself and it's probably easier.

k-sareen commented 3 weeks ago

I think we're bikeshedding the issue, personally. The idea is that this is an expensive operation and VM developers essentially need to be cognizant of that. We don't need a separate API to pause the mutators or even have a separate function for enumerating objects. Just need to have a separate GC "type" essentially wherein we only have one work packet in it which is a heap traversal work packet that will use the visitor provided in the enumerate_objects API. I think this should be simple and non-controversial to implement.