mmtk / ruby

Fork of The Ruby Programming Language [mirror], with added support for MMTk
https://www.ruby-lang.org/
Other
0 stars 1 forks source link

Supporting TracePoint and GC.stat #79

Open wks opened 1 month ago

wks commented 1 month ago

TracePoint is a mechanism to install hooks and count various GC-related events in a code block. GC.stat returns the internal statistics of the GC. They are currently not (well) supported when using MMTk. That has caused some test cases to fail.

Failing test cases

Tests related to TracePoint, mainly TestTracepointObj#test_tracks_objspace_events and TestTracepointObj#test_tracks_objspace_count, failed for various reasons.

In Debug mode, the gc_trace_point() call in obj_free attempts to call GET_EC() to find the current execution context of the current mutator thread. However, when using MMTk, obj_free is executed by GC worker threads which do not have execution contexts. It crashes because of SIGSEGV.

In Release mode, some counts are different from the expected value. Currently, when using MMTk, we do not call gc_event_trace() during object allocation, so the number of newobj is always observed as 0.

The test case TestTracepointObj#test_tracks_objspace_count also reads free_count, gc_start_count, gc_end_mark_count and gc_end_sweep_count. They are not implemented, either. It also reads from GC.stat, and that doesn't have the required keys when using MMTk, either

Supporting TracePoint

TracePoint is based on gc_event_hook. The GC-related code cals gc_event_hook at various places. Those events are defined in event.h:

It is not hard to add hooks to newobj_of. Checking gc_event_newobj_hook_needed_p(objspace) and callling gc_event_hook_prep(objspace, RUBY_INTERNAL_EVENT_NEWOBJ, obj, newobj_zero_slot(obj)); is sufficient to get the newobj_count number correct.

Other places can be supported similarly. There are things need to be changed.

It is useful to have something like TracePoint for debugging. But it should be adapted to MMTk, or other different GCs, too.

Supporting GC.stat

GC.stat extracts GC-specific statistics. MMTk internally keeps various statistics, too, and the data is used by harness_begin and harness_end. To bridge Ruby's GC.stat with MMTk, we just need to expose the needed API and call into mmtk-core

About testing

Because the statistics collected from TracePoint and GC.stat can be GC-specific, test cases should be written in a way generic to all GCs, or written specifically for each GC implementation.

However, The VM may do optimization that changes the number of objects allocated. For example, if the VM detects a function or a code block never changes a String argument, it may reuse the same String instance instead of allocating a new instance each time. So the number of object the following code snippet (inspired by TestTracepointObj#test_tracks_objspace_count) allocates may vary if the JIT compiler or the interpreter optimizes the code.

100.times { "" }
200.times { puts "Hello world!".length }

If the VM reuses the same "Hello world!" instance, there will be no objects allocated, or only one object (the String "Hello world!" itself) allocated.

So test cases depending on the implementation details of the GC or potential JIT compiler optimizations may fail mysteriously.

peterzhu2118 commented 1 month ago

For TracePoint, I think the priority is to get it working at all. The GC TracePoint are not available on the Ruby level (because executing the event cannot allocate objects), and are only available for C extensions. Therefore, I don't think it's a priority and it's used fairly rarely (mostly used by profilers).

For example, consider the following Ruby code:

tp = TracePoint.new(:line) { puts "new line!" }
tp.enable
puts "hello"
puts "world"

On regular Ruby, it outputs:

new line!
hello
new line!
world

Whereas on MMTk, the event does not fire:

hello
world

This is because MMTk does not support heap walking which is required to turn on tracing for all of the iseq objects.