Open wks opened 1 year ago
The TracePoint
mechanism in Ruby sets up hooks in every iseq
object. It is implemented by traversing all objects in the heap and filtering out iseq
instances. Setting hooks is implemented in C, without involving calling Ruby code or allocating heap objects. Therefore, we don't need to consider GC being triggered while traversing the heap.
Given our recent work about searching interior pointer using VO bits, we can extend the algorithm to find all objects in a space instead of the first object before/after a given address.
We just need to have a generic method for visiting bits in side-metadata. Then we could visit the mark or VO-bits for each object. ART implements something like this and they use it extensively. The current iterate_meta_bits
function iterates over the metadata addresses, but we actually want a higher-level iteration wherein the metadata address is converted back to the object address. It should be possible to implement this using the current iterate_meta_bits
function, imo.
I've implemented heap walking (i.e. object visitor) in ART by just doing a naive linear scan over the entire heap address space and checking the mark bit. This works of course, but is probably not the best in terms of performance.
We have ObjectIterator
: https://docs.mmtk.io/api/mmtk/util/linear_scan/struct.ObjectIterator.html. It is used by mark compact at the moment and only works with VO bit.
That's not the same. That's a linear scan. I'm saying we should be able to iterate through any metadata bitmap and call a visitor (be it a visitor for ObjectReference
, Block
, etc.) on it. This will let us implement a heap visitor more efficiently than a linear scan.
Some programming languages allow the user to enumerate all heap objects or a subset of them.
Examples
Ruby
(See: https://docs.ruby-lang.org/en/master/ObjectSpace.html#method-c-each_object )
JVM TI
FollowReference
(visit reachable objects from a given object)IterateThroughHeap
(traverse the whole heap)GetObjectsWithTags
(return objects that have a given tag)Can full-heap traversal visit "dead" objects?
Ruby's
ObjectSpace.each_object
Ruby's documentation for
ObjectSpace.each_object
saysBut here "living" seems to mean "not yet collected by the GC" because a mere function call is not able to determine whether the object is reachable from any roots. The following program shows that the
Foo
instance can still be enumerated until GC is triggered.result:
JVM TI
IterateThroughHeap
may reach dead but not reclaimed objects, too. JVM TI doc forIterateThroughHeap
:What happens if objects are allocated during traversal?
Ruby
The interaction is mysterious. If any object of the same
type
is created in the block ofObjectSpace.each_object(type) {|x| ... }
, the newly created objects may or may not be visited.Example:
Given the same command line argument, the result may even vary between consecutive executions of the program.
JVM TI
JVM TI guarantees the heap state (including objects and field values) is not changed during traversal. The following paragraph exists in the JVM TI documentation of both
FollowReference
andIterateThroughHeap
.What happens if GC is triggered during iteration?
Ruby
It is undocumented.
ObjectSpace.each_object
does not preventGC.start
to be called in the block. But it seems that callingGC.start
will remove dead objects immediately. As a result, theObjectSpace.each_object
method will only visit some dead objects but not others.GC will be triggered after visiting 5 objects. The program will visit 5 to 7 objects, depending on whether the two
Foo
held by live roots are visited in the first five iterations.JVM TI
As previously mentioned, the heap state does not change during iteration.
The call-back function of
FollowReference
andIterateThroughHeap
is not allowed to call any JNI functions. The JVM TI functionForceGarbageCollection
is not "Callback Safe", either. So it is impossible to trigger GC in the callback.The
GetObjectsWithTags
function does not have any call-backs. It returns an array of object references.Implementation
One obvious way to implement this feature is using the valid-object (VO) bit to scan each space. However, if a space is sparse, it may be helpful to narrow down the region of scanning by using space-specific metadata. In this way, we only need to scan regions (blocks or lines) actually occupied by objects.
PR https://github.com/mmtk/mmtk-core/pull/1174 implements heap traversal by scanning the VO bits. For block-based spaces, it only scans blocks occupied by objects. For LOS, objects are simply enumerated from the treadmill.