mmtk / mmtk-core

Memory Management ToolKit
https://www.mmtk.io
Other
375 stars 69 forks source link

API for enumerating objects #795

Open wks opened 1 year ago

wks commented 1 year ago

Some programming languages allow the user to enumerate all heap objects or a subset of them.

Examples

Ruby

# Enumerate all objects and print their classes
ObjectSpace.each_object() {|x| puts x.class }

# Enumerate all strings and print them
ObjectSpace.each_object(String) {|x| puts x }

# Return all reachable objects from obj
ObjectSpace.reachable_objects_from(obj)

# Return all reachable objects from roots
ObjectSpace.reachable_objects_from_root

(See: https://docs.ruby-lang.org/en/master/ObjectSpace.html#method-c-each_object )

JVM TI

Can full-heap traversal visit "dead" objects?

Ruby's ObjectSpace.each_object

Ruby's documentation for ObjectSpace.each_object says

Calls the block once for each living, nonimmediate object in this Ruby process.

But here "living" seems to mean "not yet collected by the GC" because a mere function call is not able to determine whether the object is reachable from any roots. The following program shows that the Foo instance can still be enumerated until GC is triggered.

puts "Hello!"

class Foo
  def initialize(x)
    @x = x
  end
  attr_accessor :x
end

a = Foo.new(42)
puts "Set!"
ObjectSpace.each_object(Foo) {|f| puts f.x}

a = nil
puts "Cleared!"
ObjectSpace.each_object(Foo) {|f| puts f.x}

GC.start
puts "Garbage collected!"
ObjectSpace.each_object(Foo) {|f| puts f.x}

puts "Goodbye!"

result:

Hello!
Set!
42
Cleared!
42
Garbage collected!
Goodbye!

JVM TI

IterateThroughHeap may reach dead but not reclaimed objects, too. JVM TI doc for IterateThroughHeap:

Initiate an iteration over all objects in the heap. This includes both reachable and unreachable objects. Objects are visited in no particular order.

What happens if objects are allocated during traversal?

Ruby

The interaction is mysterious. If any object of the same type is created in the block of ObjectSpace.each_object(type) {|x| ... }, the newly created objects may or may not be visited.

Example:

puts "Hello!"

class Foo
  def initialize(x)
    @x = x
  end
  attr_accessor :x
end

upper_bound = ARGV[0].to_i # command-line argument

(1..upper_bound).each do |i|
  Foo.new(i)
end

puts "Objects created."

ObjectSpace.each_object(Foo) {|f| puts f.x}

puts "Iterate and create new objects..."

i = -1

ObjectSpace.each_object(Foo) do |f|
  puts f.x
  puts "Triggering GC..."
  c = Foo.new(i)
  d = Foo.new(i - 1)
  i = i - 2
end

puts "Goodbye!"

Given the same command line argument, the result may even vary between consecutive executions of the program.

JVM TI

JVM TI guarantees the heap state (including objects and field values) is not changed during traversal. The following paragraph exists in the JVM TI documentation of both FollowReference and IterateThroughHeap.

During the execution of this function the state of the heap does not change: no objects are allocated, no objects are garbage collected, and the state of objects (including held values) does not change. As a result, threads executing Java programming language code, threads attempting to resume the execution of Java programming language code, and threads attempting to execute JNI functions are typically stalled.

What happens if GC is triggered during iteration?

Ruby

It is undocumented. ObjectSpace.each_object does not prevent GC.start to be called in the block. But it seems that calling GC.start will remove dead objects immediately. As a result, the ObjectSpace.each_object method will only visit some dead objects but not others.

puts "Hello!"

class Foo
  def initialize(x)
    @x = x
  end
  attr_accessor :x
end

live_root1 = Foo.new("This one lives")

100.times do |j|
  Foo.new(j)
end

live_root2 = Foo.new("This one lives, too")

i = 1

ObjectSpace.each_object(Foo) do |obj|
  puts "#{i}: #{obj.x}"
  if i % 5 == 0
    GC.start
  end
  i = i + 1
end

puts "Goodbye!"

GC will be triggered after visiting 5 objects. The program will visit 5 to 7 objects, depending on whether the two Foo held by live roots are visited in the first five iterations.

JVM TI

As previously mentioned, the heap state does not change during iteration.

The call-back function of FollowReference and IterateThroughHeap is not allowed to call any JNI functions. The JVM TI function ForceGarbageCollection is not "Callback Safe", either. So it is impossible to trigger GC in the callback.

The GetObjectsWithTags function does not have any call-backs. It returns an array of object references.

Implementation

One obvious way to implement this feature is using the valid-object (VO) bit to scan each space. However, if a space is sparse, it may be helpful to narrow down the region of scanning by using space-specific metadata. In this way, we only need to scan regions (blocks or lines) actually occupied by objects.

PR https://github.com/mmtk/mmtk-core/pull/1174 implements heap traversal by scanning the VO bits. For block-based spaces, it only scans blocks occupied by objects. For LOS, objects are simply enumerated from the treadmill.

wks commented 3 months ago

The TracePoint mechanism in Ruby sets up hooks in every iseq object. It is implemented by traversing all objects in the heap and filtering out iseq instances. Setting hooks is implemented in C, without involving calling Ruby code or allocating heap objects. Therefore, we don't need to consider GC being triggered while traversing the heap.

Given our recent work about searching interior pointer using VO bits, we can extend the algorithm to find all objects in a space instead of the first object before/after a given address.

k-sareen commented 3 months ago

We just need to have a generic method for visiting bits in side-metadata. Then we could visit the mark or VO-bits for each object. ART implements something like this and they use it extensively. The current iterate_meta_bits function iterates over the metadata addresses, but we actually want a higher-level iteration wherein the metadata address is converted back to the object address. It should be possible to implement this using the current iterate_meta_bits function, imo.

I've implemented heap walking (i.e. object visitor) in ART by just doing a naive linear scan over the entire heap address space and checking the mark bit. This works of course, but is probably not the best in terms of performance.

qinsoon commented 3 months ago

We have ObjectIterator: https://docs.mmtk.io/api/mmtk/util/linear_scan/struct.ObjectIterator.html. It is used by mark compact at the moment and only works with VO bit.

k-sareen commented 3 months ago

That's not the same. That's a linear scan. I'm saying we should be able to iterate through any metadata bitmap and call a visitor (be it a visitor for ObjectReference, Block, etc.) on it. This will let us implement a heap visitor more efficiently than a linear scan.