Closed wks closed 3 weeks ago
There is a very high chance to trigger an error when running the following command with the current master
branch of mmtk-ruby and the dev/mmtk-overrides-default
branch of ruby, and using StickyImmix and without specifying heap size:
MMTK_THREADS=1 taskset -c 0 '/home/wks/projects/mmtk-github/ruby/build-debug/ruby' '--mmtk-plan=StickyImmix' '-r../tool/lib/_tmpdir' '../test/runner.rb' '--ruby=./miniruby -I../lib -I. -I.ext/common ../tool/runruby.rb --extout=.ext -- --mmtk-plan=StickyImmix' '--excludes-dir=../test/.excludes' '--name=!/memory_leak/' '-v' '../test/ruby/test_autoload.rb' --seed=32744
But unlike in the last post (about givtbl or unknown data type), the error manifests as visiting a field pointing to an address that does not have the VO bit set (not allocated, or already recycled).
Destination is not an MMTk object. Src: 0x200ffc06c10 dst: 0x201001d4a78
It won't trigger the bug if I add -n /fork/
to run only test_autoload_fork
. It means it requires some data structures to be initialized in some specific ways in order to trigger the bug. It also seems that the bug can only be triggered if GC happens just at the right moment (when auto-loading).
test_autoload_fork
creates new threads and forks. We need at least two threads to trigger the bug. That is, replacing the 3.times do
in test_autoload_fork
with 2.times do
will still trigger the bug. But 1.times do
won't trigger it.
thrs << Thread.new { AutoloadTest && nil }
can be commented out and still trigger the bug.
At least one child process needs to evaluate AutoloadTest
. Other child processes only needs to exist (by executing something like sleep 2
).
I have currently observed the source object sometimes being iseq and sometimes being T_ICLASS
. Here are two stack traces:
panicked at src/scanning.rs:42:13:
Destination is not an MMTk object. Src: 0x200ffca1e58 dst: 0x2010034d3c0
[2024-08-09T03:46:34Z INFO mmtk::plan::sticky::immix::global] Full heap GC
[2024-08-09T03:46:34Z INFO mmtk::policy::immix::defrag] Defrag: true
0: mmtk_ruby::handle_gc_thread_panic
at /home/wks/projects/mmtk-github/mmtk-ruby/mmtk/src/lib.rs:102:14
1: mmtk_ruby::set_panic_hook::{{closure}}
at /home/wks/projects/mmtk-github/mmtk-ruby/mmtk/src/lib.rs:131:13
2: <alloc::boxed::Box<F,A> as core::ops::function::Fn<Args>>::call
at /rustc/aedd173a2c086e558c2b66d3743b344f977621a7/library/alloc/src/boxed.rs:2029:9
3: std::panicking::rust_panic_with_hook
at /rustc/aedd173a2c086e558c2b66d3743b344f977621a7/library/std/src/panicking.rs:785:13
4: std::panicking::begin_panic_handler::{{closure}}
at /rustc/aedd173a2c086e558c2b66d3743b344f977621a7/library/std/src/panicking.rs:659:13
5: std::sys_common::backtrace::__rust_end_short_backtrace
at /rustc/aedd173a2c086e558c2b66d3743b344f977621a7/library/std/src/sys_common/backtrace.rs:171:18
6: rust_begin_unwind
at /rustc/aedd173a2c086e558c2b66d3743b344f977621a7/library/std/src/panicking.rs:647:5
7: core::panicking::panic_fmt
at /rustc/aedd173a2c086e558c2b66d3743b344f977621a7/library/core/src/panicking.rs:72:14
8: <mmtk_ruby::scanning::VMScanning as mmtk::vm::scanning::Scanning<mmtk_ruby::Ruby>>::scan_object_and_trace_edges::{{closure}}
at /home/wks/projects/mmtk-github/mmtk-ruby/mmtk/src/scanning.rs:42:13
9: mmtk_ruby::abi::ObjectClosure::c_function_registered
at /home/wks/projects/mmtk-github/mmtk-ruby/mmtk/src/abi.rs:217:9
10: rb_mmtk_call_object_closure
at ./../mmtk_support.c:503:12
11: rb_mmtk_mark_and_move
at ./../mmtk_support.c:576:40
12: rb_gc_impl_mark_and_move
at ./../gc/default.c:5121:9
13: rb_gc_mark_and_move
at ./../gc.c:2456:5
14: rb_iseq_mark_and_move_each_value
at ./../iseq.c:278:17
15: rb_iseq_mark_and_move
at ./../iseq.c:344:9
16: rb_imemo_mark_and_move
at ./../imemo.c:455:9
17: rb_gc_mark_children
at ./../gc.c:2588:9
18: gc_mark_children
at ./../gc/default.c:5261:5
19: rb_mmtk_mark_children
at ./../gc/default.c:10574:5
20: rb_mmtk_scan_object_ruby_style
at ./../mmtk_support.c:537:5
21: <mmtk_ruby::scanning::VMScanning as mmtk::vm::scanning::Scanning<mmtk_ruby::Ruby>>::scan_object_and_trace_edges::{{closure}}
at /home/wks/projects/mmtk-github/mmtk-ruby/mmtk/src/scanning.rs:59:17
22: mmtk_ruby::abi::ObjectClosure::set_temporarily_and_run_code
at /home/wks/projects/mmtk-github/mmtk-ruby/mmtk/src/abi.rs:201:22
23: <mmtk_ruby::scanning::VMScanning as mmtk::vm::scanning::Scanning<mmtk_ruby::Ruby>>::scan_object_and_trace_edges
at /home/wks/projects/mmtk-github/mmtk-ruby/mmtk/src/scanning.rs:56:9
24: mmtk::scheduler::gc_work::ScanObjectsWork::do_work_common::{{closure}}
at /home/wks/projects/mmtk-github/mmtk-core/src/scheduler/gc_work.rs:873:21
25: <mmtk::scheduler::gc_work::ProcessEdgesWorkTracerContext<E> as mmtk::vm::scanning::ObjectTracerContext<<E as mmtk::scheduler::gc_work::ProcessEdgesWork>::VM>>::with_tracer
at /home/wks/projects/mmtk-github/mmtk-core/src/scheduler/gc_work.rs:303:22
26: mmtk::scheduler::gc_work::ScanObjectsWork::do_work_common
at /home/wks/projects/mmtk-github/mmtk-core/src/scheduler/gc_work.rs:869:13
27: <mmtk::scheduler::gc_work::PlanScanObjects<E,P> as mmtk::scheduler::work::GCWork<<E as mmtk::scheduler::gc_work::ProcessEdgesWork>::VM>>::do_work
at /home/wks/projects/mmtk-github/mmtk-core/src/scheduler/gc_work.rs:1052:9
28: mmtk::scheduler::work::GCWork::do_work_with_stat
at /home/wks/projects/mmtk-github/mmtk-core/src/scheduler/work.rs:45:9
29: mmtk::scheduler::worker::GCWorker<VM>::run
at /home/wks/projects/mmtk-github/mmtk-core/src/scheduler/worker.rs:244:13
30: mmtk::memory_manager::start_worker
at /home/wks/projects/mmtk-github/mmtk-core/src/memory_manager.rs:492:5
31: <mmtk_ruby::collection::VMCollection as mmtk::vm::collection::Collection<mmtk_ruby::Ruby>>::spawn_gc_thread::{{closure}}
at /home/wks/projects/mmtk-github/mmtk-ruby/mmtk/src/collection.rs:54:21
32: std::sys_common::backtrace::__rust_begin_short_backtrace
at /rustc/aedd173a2c086e558c2b66d3743b344f977621a7/library/std/src/sys_common/backtrace.rs:155:18
33: std::thread::Builder::spawn_unchecked_::{{closure}}::{{closure}}
at /rustc/aedd173a2c086e558c2b66d3743b344f977621a7/library/std/src/thread/mod.rs:529:17
34: <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
at /rustc/aedd173a2c086e558c2b66d3743b344f977621a7/library/core/src/panic/unwind_safe.rs:272:9
35: std::panicking::try::do_call
at /rustc/aedd173a2c086e558c2b66d3743b344f977621a7/library/std/src/panicking.rs:554:40
36: __rust_try
37: std::panicking::try
at /rustc/aedd173a2c086e558c2b66d3743b344f977621a7/library/std/src/panicking.rs:518:19
38: std::panic::catch_unwind
at /rustc/aedd173a2c086e558c2b66d3743b344f977621a7/library/std/src/panic.rs:142:14
39: std::thread::Builder::spawn_unchecked_::{{closure}}
at /rustc/aedd173a2c086e558c2b66d3743b344f977621a7/library/std/src/thread/mod.rs:528:30
40: core::ops::function::FnOnce::call_once{{vtable.shim}}
at /rustc/aedd173a2c086e558c2b66d3743b344f977621a7/library/core/src/ops/function.rs:250:5
41: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
at /rustc/aedd173a2c086e558c2b66d3743b344f977621a7/library/alloc/src/boxed.rs:2015:9
42: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
at /rustc/aedd173a2c086e558c2b66d3743b344f977621a7/library/alloc/src/boxed.rs:2015:9
43: std::sys::pal::unix::thread::Thread::new::thread_start
at /rustc/aedd173a2c086e558c2b66d3743b344f977621a7/library/std/src/sys/pal/unix/thread.rs:108:17
44: <unknown>
45: <unknown>
And this (with some frames omitted):
ERROR: An MMTk GC thread panicked. This is a bug. PID: 444536
panicked at src/scanning.rs:51:13:
Destination is not an MMTk object. Src: 0x200ffc06c10 dst: 0x20100219440
0: mmtk_ruby::handle_gc_thread_panic
at /home/wks/projects/mmtk-github/mmtk-ruby/mmtk/src/lib.rs:102:14
1: mmtk_ruby::set_panic_hook::{{closure}}
at /home/wks/projects/mmtk-github/mmtk-ruby/mmtk/src/lib.rs:131:13
2: <alloc::boxed::Box<F,A> as core::ops::function::Fn<Args>>::call
at /rustc/aedd173a2c086e558c2b66d3743b344f977621a7/library/alloc/src/boxed.rs:2029:9
3: std::panicking::rust_panic_with_hook
at /rustc/aedd173a2c086e558c2b66d3743b344f977621a7/library/std/src/panicking.rs:785:13
4: std::panicking::begin_panic_handler::{{closure}}
at /rustc/aedd173a2c086e558c2b66d3743b344f977621a7/library/std/src/panicking.rs:659:13
5: std::sys_common::backtrace::__rust_end_short_backtrace
at /rustc/aedd173a2c086e558c2b66d3743b344f977621a7/library/std/src/sys_common/backtrace.rs:171:18
6: rust_begin_unwind
at /rustc/aedd173a2c086e558c2b66d3743b344f977621a7/library/std/src/panicking.rs:647:5
7: core::panicking::panic_fmt
at /rustc/aedd173a2c086e558c2b66d3743b344f977621a7/library/core/src/panicking.rs:72:14
8: <mmtk_ruby::scanning::VMScanning as mmtk::vm::scanning::Scanning<mmtk_ruby::Ruby>>::scan_object_and_trace_edges::{{closure}}
at /home/wks/projects/mmtk-github/mmtk-ruby/mmtk/src/scanning.rs:51:13
9: mmtk_ruby::abi::ObjectClosure::c_function_registered
at /home/wks/projects/mmtk-github/mmtk-ruby/mmtk/src/abi.rs:217:9
10: rb_mmtk_call_object_closure
at ./../mmtk_support.c:503:12
11: rb_mmtk_mark
at ./../mmtk_support.c:519:9
12: rb_mmtk_mark_movable
at ./../mmtk_support.c:561:5
13: gc_mark
at ./../gc/default.c:5051:9
14: rb_gc_impl_mark
at ./../gc/default.c:5147:5
15: mark_method_entry_i
at ./../gc.c:2214:5
16: rb_id_table_foreach_values
at ./../id_table.c:296:53
17: mark_m_tbl
at ./../gc.c:2223:9
18: rb_gc_mark_children
at ./../gc.c:2638:9
19: gc_mark_children
at ./../gc/default.c:5261:5
20: rb_mmtk_mark_children
at ./../gc/default.c:10574:5
21: rb_mmtk_scan_object_ruby_style
at ./../mmtk_support.c:537:5
22: <mmtk_ruby::scanning::VMScanning as mmtk::vm::scanning::Scanning<mmtk_ruby::Ruby>>::scan_object_and_trace_edges::{{closure}}
at /home/wks/projects/mmtk-github/mmtk-ruby/mmtk/src/scanning.rs:68:17
Some memory dump of the destination object (of the second stack trace):
dest - 8: 28 00 00 00 00 00 00 00 # The hidden size field. The obj size is 40
dest : 1a 60 1e 00 00 00 00 00 # 1a: imemo, 6: imemo_ment
dest + 8: 10 6c c0 ff 00 02 00 00
dest +16: 40 16 78 48 12 60 00 00
dest +24: 31 41 00 00 00 00 00 00
dest +32: e0 6a c0 ff 00 02 00 00
Since the header looks like a Ruby flags field, it should be an object that has been considered as dead by the GC, and its space has not been reused.
When forking, only the current pthread is copied to the child process. That means other threads will die at a random place. mmtk-ruby already informs mmtk-core to let all GC threads to exit, so the problem can be from the other Ruby thread. It could be possible that the other Ruby thread inserted some dangling pointers into the global state while the current thread forked.
The Linux man page states that "in a multi-threaded program, the child can safely call only async-signal-safe functions (see signal-safety(7)) until such time as it calls execve(2)".
The "Destination is not an MMTk object" bug is likely caused by not flushing the mod buffers of the threads disappeared at fork()
. Since fork()
only copies the current thread to the child process, other threads simply vanished, dying at wherever place they are currently running. That gives those threads no chance to flush their mutator-local mod buffers. In a generational GC, the mod buffer records old objects that have young children. Failing to flush mod buffers before the next nursery GC will cause young objects to be erroneously identified as dead while they are actually pointed by old objects.
The following snippet is guaranteed to cause an error when using StickyImmix and automatic heap size. The child process will see "Destination is not an MMTk object" errors in the next full-heap GC.
puts "[p0, t0] Hello world!"
puts "[p0, t0] Creating array..."
ar = Array::new(10, nil)
puts "[p0, t0] Forcing GC so that ar becomes old..."
GC::start
queue = Thread::Queue.new
puts "[p0, t0] Creating thread..."
Thread::new do
puts "[p0, t1] Adding objects..."
10.times do |i|
ar[i] = "Foo #{i}" # This will add ar to the modbuf
end
puts "[p0, t1] Telling the main thread that we have done..."
queue << :done
puts "[p0, t1] Waiting almost forever..."
sleep 1000000
end
puts "[p0, t0] Waiting for thread1 to add elements..."
queue.pop
if false # Change this to true and it will not crash
puts "[p0, t0] Do another GC to make the children of ar old"
GC::start
end
puts "[p0, t0] Forking..."
p1 = fork do
puts "[p1, t0] Allocating enough live objects to trigger a full-heap GC..."
ar2 = []
10000.times do |i|
ar2 << "Bar #{i}"
end
puts "[p1, t0] I should have crashed!"
p ar
exit 0
end
puts "[p0, t0] Joining p1 = #{p1}..."
_, status = Process.waitpid2(p1)
puts "[p0, t0] #{status}"
puts "[p0, t0] Goodbye world!"
The parent process creates a thread t1
to allocate new objects and add references to those objects into a mature array. This will add the array to the modbuf. The thread t1
then waits forever. When the main thread t0
calls fork
, t1
simply disappears in the child process. That means the modbuf of t1
will simply be abandoned. The child process then allocate lots of live objects. MMTk will first run a nursery GC and then a full-heap GC when it fails to free up enough memory. The nursery GC will recycle the children of the array, and the full-heap GC will try to reach the dead children from the array and cause "Destination is not MMTk object" error.
If the parent process does a GC before forking (changing if false
to if true
), the thread t1
will be recognized when GC starts, and its modbuf will be flushed. That will correctly trace the children of the array, keep them alive, and make them old. Then it will not crash after forking.
Error happens when running the
TestAutoload#test_autoload_fork
test case intest-all
. This is not always reproducible, but is quite often reproducible on thedev/mmtk-overrides-default
branch, with the StickyImmix plan.The bug manifests in multiple ways.