mmtk / ruby

Fork of The Ruby Programming Language [mirror], with added support for MMTk
https://www.ruby-lang.org/
Other
0 stars 1 forks source link

Random error about finalizer job still exists on exit #81

Closed wks closed 3 weeks ago

wks commented 1 month ago

There is a random failure when running make test-all in the dev/mmtk-overrides-default branch. It prints the message "[BUG] Finalize job still exists after obj_free on exit has started.", but the stack trace shows it is in obj_free_object_id. I think the stack trace is inaccurate because it is the release build. It is random. I observed this error when running make test-all (using whitelist as before) on my laptop, but I am not able to reproduce this crash.

[8447/8840] TestProcess#test_wait_and_sigchild[BUG] Finalize job still exists after obj_free on exit has started.
ruby 3.4.0dev (2024-07-30T02:41:51Z dev/mmtk-overrides.. 22e7a807e3) +MMTk(StickyImmix) [x86_64-linux]

-- Control frame information -----------------------------------------------
c:0033 p:---- s:0188 e:000187 CFUNC  :fork
c:0032 p:0004 s:0184 e:000183 BLOCK  /home/wks/projects/mmtk-github/ruby/test/ruby/test_process.rb:1736 [FINISH]
c:0031 p:---- s:0179 e:000178 CFUNC  :pipe
c:0030 p:0016 s:0175 E:000c88 BLOCK  /home/wks/projects/mmtk-github/ruby/test/ruby/test_process.rb:1735 [FINISH]
c:0029 p:---- s:0169 e:000168 CFUNC  :pipe
c:0028 p:0009 s:0165 E:001938 METHOD /home/wks/projects/mmtk-github/ruby/test/ruby/test_process.rb:1729
c:0027 p:0037 s:0160 e:000159 METHOD /home/wks/projects/mmtk-github/ruby/tool/lib/test/unit/testcase.rb:202
c:0026 p:0052 s:0154 e:000153 METHOD /home/wks/projects/mmtk-github/ruby/tool/lib/test/unit/testcase.rb:170

...

-- C level backtrace information -------------------------------------------
/home/wks/projects/mmtk-github/ruby/build-release/ruby(rb_print_backtrace+0x14) [0x55ad06173203] ../vm_dump.c:824
/home/wks/projects/mmtk-github/ruby/build-release/ruby(rb_vm_bugreport) ../vm_dump.c:1155
/home/wks/projects/mmtk-github/ruby/build-release/ruby(bug_report_end+0x0) [0x55ad0612b1d0] ../error.c:1095
/home/wks/projects/mmtk-github/ruby/build-release/ruby(rb_bug_without_die) ../error.c:1095
/home/wks/projects/mmtk-github/ruby/build-release/ruby(die+0x0) [0x55ad05d64a18] ../error.c:1103
/home/wks/projects/mmtk-github/ruby/build-release/ruby(rb_bug) ../error.c:1105
/home/wks/projects/mmtk-github/ruby/build-release/ruby(obj_free_object_id+0x0) [0x55ad05d5582f] ../gc/default.c:3372
/home/wks/projects/mmtk-github/ruby/build-release/ruby(finalize_list) ../gc/default.c:3343
/home/wks/projects/mmtk-github/ruby/build-release/ruby(finalize_deferred_heap_pages) ../gc/default.c:3431
/home/wks/projects/mmtk-github/ruby/build-release/ruby(finalize_deferred_heap_pages) ../gc/default.c:3420
/home/wks/projects/mmtk-github/ruby/build-release/ruby(rb_gc_impl_shutdown_call_finalizer+0x4c2) [0x55ad05d8f532] ../gc/default.c:3606
/home/wks/projects/mmtk-github/ruby/build-release/ruby(rb_ec_finalize+0x16) [0x55ad05d69fa6] ../eval.c:164
/home/wks/projects/mmtk-github/ruby/build-release/ruby(rb_ec_cleanup) ../eval.c:255
/home/wks/projects/mmtk-github/ruby/build-release/ruby(ruby_cleanup+0x9) [0x55ad05d6a5f6] ../eval.c:178
/home/wks/projects/mmtk-github/ruby/build-release/ruby(ruby_stop) ../eval.c:290
/home/wks/projects/mmtk-github/ruby/build-release/ruby(rb_f_fork+0x1f) [0x55ad05e38ef4] ../process.c:4401
/home/wks/projects/mmtk-github/ruby/build-release/ruby(rb_f_fork) ../process.c:4391
/home/wks/projects/mmtk-github/ruby/build-release/ruby(vm_call_cfunc_with_frame_+0xc3) [0x55ad05f2c34b] ../vm_insnhelper.c:3786
/home/wks/projects/mmtk-github/ruby/build-release/ruby(vm_call_cfunc_with_frame) ../vm_insnhelper.c:3832
/home/wks/projects/mmtk-github/ruby/build-release/ruby(vm_sendish+0xcf) [0x55ad05f3321f] ../vm_insnhelper.c:5954
/home/wks/projects/mmtk-github/ruby/build-release/ruby(vm_exec_core+0x8a5) [0x55ad05f4bdb5] ../insns.def:871
/home/wks/projects/mmtk-github/ruby/build-release/ruby(rb_vm_exec+0x255) [0x55ad05f3c415] ../vm.c:2575
/home/wks/projects/mmtk-github/ruby/build-release/ruby(vm_yield_with_cref+0x40) [0x55ad05f40308] ../vm.c:1649
/home/wks/projects/mmtk-github/ruby/build-release/ruby(vm_yield) ../vm.c:1657
/home/wks/projects/mmtk-github/ruby/build-release/ruby(rb_yield_0) ../vm_eval.c:1351
/home/wks/projects/mmtk-github/ruby/build-release/ruby(rb_yield) ../vm_eval.c:1367
/home/wks/projects/mmtk-github/ruby/build-release/ruby(rb_ensure+0xe6) [0x55ad05d6b646] ../eval.c:1059
/home/wks/projects/mmtk-github/ruby/build-release/ruby(rb_io_s_pipe+0x394) [0x55ad05dc35f4] ../io.c:11866
/home/wks/projects/mmtk-github/ruby/build-release/ruby(vm_call_cfunc_with_frame_+0xc3) [0x55ad05f2c34b] ../vm_insnhelper.c:3786
/home/wks/projects/mmtk-github/ruby/build-release/ruby(vm_call_cfunc_with_frame) ../vm_insnhelper.c:3832
/home/wks/projects/mmtk-github/ruby/build-release/ruby(vm_sendish+0xcf) [0x55ad05f3321f] ../vm_insnhelper.c:5954
/home/wks/projects/mmtk-github/ruby/build-release/ruby(vm_exec_core+0x8a5) [0x55ad05f4bdb5] ../insns.def:871
/home/wks/projects/mmtk-github/ruby/build-release/ruby(rb_vm_exec+0x255) [0x55ad05f3c415] ../vm.c:2575
/home/wks/projects/mmtk-github/ruby/build-release/ruby(vm_yield_with_cref+0x40) [0x55ad05f40308] ../vm.c:1649
/home/wks/projects/mmtk-github/ruby/build-release/ruby(vm_yield) ../vm.c:1657
/home/wks/projects/mmtk-github/ruby/build-release/ruby(rb_yield_0) ../vm_eval.c:1351
/home/wks/projects/mmtk-github/ruby/build-release/ruby(rb_yield) ../vm_eval.c:1367
/home/wks/projects/mmtk-github/ruby/build-release/ruby(rb_ensure+0xe6) [0x55ad05d6b646] ../eval.c:1059
/home/wks/projects/mmtk-github/ruby/build-release/ruby(rb_io_s_pipe+0x394) [0x55ad05dc35f4] ../io.c:11866
/home/wks/projects/mmtk-github/ruby/build-release/ruby(vm_call_cfunc_with_frame_+0xc3) [0x55ad05f2c34b] ../vm_insnhelper.c:3786
/home/wks/projects/mmtk-github/ruby/build-release/ruby(vm_call_cfunc_with_frame) ../vm_insnhelper.c:3832
/home/wks/projects/mmtk-github/ruby/build-release/ruby(vm_sendish+0xcf) [0x55ad05f3321f] ../vm_insnhelper.c:5954
/home/wks/projects/mmtk-github/ruby/build-release/ruby(vm_exec_core+0x8a5) [0x55ad05f4bdb5] ../insns.def:871
/home/wks/projects/mmtk-github/ruby/build-release/ruby(rb_vm_exec+0x255) [0x55ad05f3c415] ../vm.c:2575
/home/wks/projects/mmtk-github/ruby/build-release/ruby(vm_yield_with_cref+0x40) [0x55ad05f40308] ../vm.c:1649
/home/wks/projects/mmtk-github/ruby/build-release/ruby(vm_yield) ../vm.c:1657
/home/wks/projects/mmtk-github/ruby/build-release/ruby(rb_yield_0) ../vm_eval.c:1351
/home/wks/projects/mmtk-github/ruby/build-release/ruby(rb_yield) ../vm_eval.c:1367
/home/wks/projects/mmtk-github/ruby/build-release/ruby(rb_ary_collect+0x64) [0x55ad06062424] ../array.c:3973
/home/wks/projects/mmtk-github/ruby/build-release/ruby(vm_call_cfunc_with_frame_+0xc3) [0x55ad05f2c34b] ../vm_insnhelper.c:3786
/home/wks/projects/mmtk-github/ruby/build-release/ruby(vm_call_cfunc_with_frame) ../vm_insnhelper.c:3832
/home/wks/projects/mmtk-github/ruby/build-release/ruby(vm_sendish+0xcf) [0x55ad05f3321f] ../vm_insnhelper.c:5954
/home/wks/projects/mmtk-github/ruby/build-release/ruby(vm_exec_core+0x8a5) [0x55ad05f4bdb5] ../insns.def:871
/home/wks/projects/mmtk-github/ruby/build-release/ruby(vm_exec_loop+0x33) [0x55ad05f3c503] ../vm.c:2602
/home/wks/projects/mmtk-github/ruby/build-release/ruby(rb_vm_exec) ../vm.c:2581
/home/wks/projects/mmtk-github/ruby/build-release/ruby(load_iseq_eval+0x3d) [0x55ad05de5de9] ../load.c:781
/home/wks/projects/mmtk-github/ruby/build-release/ruby(require_internal) ../load.c:1287
/home/wks/projects/mmtk-github/ruby/build-release/ruby(rb_require_string_internal+0x2c) [0x55ad05de6246] ../load.c:1386
/home/wks/projects/mmtk-github/ruby/build-release/ruby(rb_f_require_relative) ../load.c:1041
/home/wks/projects/mmtk-github/ruby/build-release/ruby(vm_call_cfunc_with_frame_+0xc3) [0x55ad05f2c34b] ../vm_insnhelper.c:3786
/home/wks/projects/mmtk-github/ruby/build-release/ruby(vm_call_cfunc_with_frame) ../vm_insnhelper.c:3832
/home/wks/projects/mmtk-github/ruby/build-release/ruby(vm_sendish+0xb9) [0x55ad05f4b64a] ../vm_insnhelper.c:5954
/home/wks/projects/mmtk-github/ruby/build-release/ruby(vm_exec_core) ../insns.def:918
/home/wks/projects/mmtk-github/ruby/build-release/ruby(rb_vm_exec+0x255) [0x55ad05f3c415] ../vm.c:2575
/home/wks/projects/mmtk-github/ruby/build-release/ruby(rb_ec_exec_node+0x9b) [0x55ad05d6561b] ../eval.c:281
/home/wks/projects/mmtk-github/ruby/build-release/ruby(ruby_run_node+0x51) [0x55ad05d6a6c1] ../eval.c:319
/home/wks/projects/mmtk-github/ruby/build-release/ruby(rb_main+0x2b) [0x55ad05d6508a] ../main.c:55
/home/wks/projects/mmtk-github/ruby/build-release/ruby(main) ../main.c:74
/usr/lib/libc.so.6(0x7ccc67745e08) [0x7ccc67745e08]
/usr/lib/libc.so.6(__libc_start_main+0x8c) [0x7ccc67745ecc]
[0x55ad05d650d5]

...

                mmtk_plan_name: StickyImmix
               mmtk_free_bytes: 728391680
              mmtk_total_bytes: 1841745920
               mmtk_used_bytes: 1113354240
    mmtk_starting_heap_address: 0x20000000000
        mmtk_last_heap_address: 0x220000000000

 = 5.23 s
[8448/8840] TestProcess#test_clock_getres_MACH_ABSOLUTE_TIME_BASED_CLOCK_MONOTONIC = 0.00 s
[8449/8840] TestProcess#test_getpgid = 0.00 s
[8450/8840] TestProcess#test_initgroups = 0.00 s
wks commented 3 weeks ago

It can also be observed while running test_autoload_fork. Similar to TestProcess#test_wait_and_sigchild, the stack trace is similar above rb_f_fork(). It's likely to be caused by the same bug, and is related to forking.

/home/wks/projects/mmtk-github/ruby/build-release/ruby(rb_print_backtrace+0x14) [0x5608c27e5243] ../vm_dump.c:824
/home/wks/projects/mmtk-github/ruby/build-release/ruby(rb_vm_bugreport) ../vm_dump.c:1155
/home/wks/projects/mmtk-github/ruby/build-release/ruby(bug_report_end+0x0) [0x5608c279d210] ../error.c:1095
/home/wks/projects/mmtk-github/ruby/build-release/ruby(rb_bug_without_die) ../error.c:1095
/home/wks/projects/mmtk-github/ruby/build-release/ruby(die+0x0) [0x5608c23d6a18] ../error.c:1103
/home/wks/projects/mmtk-github/ruby/build-release/ruby(rb_bug) ../error.c:1105
/home/wks/projects/mmtk-github/ruby/build-release/ruby(obj_free_object_id+0x0) [0x5608c23c782f] ../gc/default.c:3372
/home/wks/projects/mmtk-github/ruby/build-release/ruby(finalize_list) ../gc/default.c:3343
/home/wks/projects/mmtk-github/ruby/build-release/ruby(finalize_deferred_heap_pages) ../gc/default.c:3431
/home/wks/projects/mmtk-github/ruby/build-release/ruby(finalize_deferred_heap_pages) ../gc/default.c:3420
/home/wks/projects/mmtk-github/ruby/build-release/ruby(rb_gc_impl_shutdown_call_finalizer+0x4c2) [0x5608c2401532] ../gc/default.c:3606
/home/wks/projects/mmtk-github/ruby/build-release/ruby(rb_ec_finalize+0x16) [0x5608c23dbfa6] ../eval.c:164
/home/wks/projects/mmtk-github/ruby/build-release/ruby(rb_ec_cleanup) ../eval.c:255
/home/wks/projects/mmtk-github/ruby/build-release/ruby(ruby_cleanup+0x9) [0x5608c23dc5f6] ../eval.c:178
/home/wks/projects/mmtk-github/ruby/build-release/ruby(ruby_stop) ../eval.c:290
/home/wks/projects/mmtk-github/ruby/build-release/ruby(rb_f_fork+0x1f) [0x5608c24aaf34] ../process.c:4401
/home/wks/projects/mmtk-github/ruby/build-release/ruby(rb_f_fork) ../process.c:4391
/home/wks/projects/mmtk-github/ruby/build-release/ruby(vm_call_cfunc_with_frame_+0xb3) [0x5608c259e6fe] ../vm_insnhelper.c:3786
/home/wks/projects/mmtk-github/ruby/build-release/ruby(vm_call_cfunc_with_frame) ../vm_insnhelper.c:3832
/home/wks/projects/mmtk-github/ruby/build-release/ruby(vm_call_cfunc_other) ../vm_insnhelper.c:3858
/home/wks/projects/mmtk-github/ruby/build-release/ruby(vm_sendish+0xcf) [0x5608c25a525f] ../vm_insnhelper.c:5954
/home/wks/projects/mmtk-github/ruby/build-release/ruby(vm_exec_core+0x8a5) [0x5608c25bddf5] ../insns.def:871
/home/wks/projects/mmtk-github/ruby/build-release/ruby(rb_vm_exec+0x255) [0x5608c25ae455] ../vm.c:2575
/home/wks/projects/mmtk-github/ruby/build-release/ruby(rb_vm_invoke_proc+0x5e) [0x5608c25b295e] ../vm.c:1743
/home/wks/projects/mmtk-github/ruby/build-release/ruby(thread_do_start_proc+0x13d) [0x5608c25599fd] ../thread.c:628
/home/wks/projects/mmtk-github/ruby/build-release/ruby(thread_do_start+0x12) [0x5608c255be0a] ../thread.c:645
/home/wks/projects/mmtk-github/ruby/build-release/ruby(thread_start_func_2) ../thread.c:708
/home/wks/projects/mmtk-github/ruby/build-release/ruby(call_thread_start_func_2+0x0) [0x5608c255cc05] ../thread_pthread.c:2236
/home/wks/projects/mmtk-github/ruby/build-release/ruby(nt_start) ../thread_pthread.c:2281
wks commented 3 weeks ago

When I ran the tests, I was still using the whitelist, so test_gc.rb and test_objectspace.rb were not enabled. Therefore, all the final jobs added during tests are instances of Tempfile. Currently, test_argf.rb, test_syntax.rb and test_io_buffer.rb use Tempfile during test.

The crash happens in the child process after forking. It needs to be reproduced by

  1. first running test cases that create Tempfile instances,
  2. and then run test cases that forks.

The following command is able to reproduce the bug with a high probability:

make test-all TESTS='../test/ruby/test_io_buffer.rb ../test/ruby/test_syntax.rb ../test/ruby/test_autoload.rb' RUN_OPTS="--mmtk-plan=StickyImmix" TESTOPTS="-v -n /ascii\\|line\\|read\\|write\\|fork/"

Here is a stacktrace collected in debug mode:

/home/wks/projects/mmtk-github/ruby/build-debug/ruby(rb_print_backtrace+0x20) [0x562c8ee67301] ../vm_dump.c:824
/home/wks/projects/mmtk-github/ruby/build-debug/ruby(rb_vm_bugreport+0x28c) [0x562c8ee67a18] ../vm_dump.c:1155
/home/wks/projects/mmtk-github/ruby/build-debug/ruby(rb_bug_without_die+0xb8) [0x562c8ec33d21] ../error.c:1095
/home/wks/projects/mmtk-github/ruby/build-debug/ruby(rb_bug+0xb1) [0x562c8ec33dfa] ../error.c:1103
/home/wks/projects/mmtk-github/ruby/build-debug/ruby(rb_mmtk_run_final_job+0x53) [0x562c8ec58db9] ../gc/default.c:3382
/home/wks/projects/mmtk-github/ruby/build-debug/ruby(rb_mmtk_run_final_jobs+0x1a) [0x562c8ec58e8f] ../gc/default.c:3424
/home/wks/projects/mmtk-github/ruby/build-debug/ruby(finalize_deferred_heap_pages+0x21) [0x562c8ec58ecb] ../gc/default.c:3434
/home/wks/projects/mmtk-github/ruby/build-debug/ruby(rb_gc_impl_shutdown_call_finalizer+0xcb) [0x562c8ec59412] ../gc/default.c:3622
/home/wks/projects/mmtk-github/ruby/build-debug/ruby(rb_objspace_call_finalizer+0x11) [0x562c8ec69dd0] ../gc.c:1610
/home/wks/projects/mmtk-github/ruby/build-debug/ruby(rb_ec_finalize+0x22) [0x562c8ec3ed46] ../eval.c:164
/home/wks/projects/mmtk-github/ruby/build-debug/ruby(rb_ec_cleanup+0x45b) [0x562c8ec3f1fe] ../eval.c:258
/home/wks/projects/mmtk-github/ruby/build-debug/ruby(ruby_cleanup+0x23) [0x562c8ec3ed9d] ../eval.c:178
/home/wks/projects/mmtk-github/ruby/build-debug/ruby(ruby_stop+0x15) [0x562c8ec3f41a] ../eval.c:294
/home/wks/projects/mmtk-github/ruby/build-debug/ruby(rb_f_fork+0x57) [0x562c8ed1b432] ../process.c:4401
/home/wks/projects/mmtk-github/ruby/build-debug/ruby(ractor_safe_call_cfunc_0+0x2c) [0x562c8ee2abc2] ../vm_insnhelper.c:3602
/home/wks/projects/mmtk-github/ruby/build-debug/ruby(vm_call_cfunc_with_frame_+0x262) [0x562c8ee2b7d0] ../vm_insnhelper.c:3786
/home/wks/projects/mmtk-github/ruby/build-debug/ruby(vm_call_cfunc_with_frame+0x72) [0x562c8ee2bade] ../vm_insnhelper.c:3832
/home/wks/projects/mmtk-github/ruby/build-debug/ruby(vm_call_cfunc_other+0x22f) [0x562c8ee2bd0f] ../vm_insnhelper.c:3858
/home/wks/projects/mmtk-github/ruby/build-debug/ruby(vm_call_cfunc+0x143) [0x562c8ee2c14a] ../vm_insnhelper.c:3940
/home/wks/projects/mmtk-github/ruby/build-debug/ruby(vm_call_method_each_type+0x1bd) [0x562c8ee2f1e4] ../vm_insnhelper.c:4765
/home/wks/projects/mmtk-github/ruby/build-debug/ruby(vm_call_method+0x17f) [0x562c8ee2fd66] ../vm_insnhelper.c:4902
/home/wks/projects/mmtk-github/ruby/build-debug/ruby(vm_call_general+0x2b) [0x562c8ee2ff11] ../vm_insnhelper.c:4935
/home/wks/projects/mmtk-github/ruby/build-debug/ruby(vm_sendish+0x2d3) [0x562c8ee328d3] ../vm_insnhelper.c:5954
/home/wks/projects/mmtk-github/ruby/build-debug/ruby(vm_exec_core+0x7585) [0x562c8ee3e19c] ../insns.def:871
/home/wks/projects/mmtk-github/ruby/build-debug/ruby(rb_vm_exec+0x12a) [0x562c8ee5a0ad] ../vm.c:2575
/home/wks/projects/mmtk-github/ruby/build-debug/ruby(invoke_block+0xda) [0x562c8ee56958] ../vm.c:1524
/home/wks/projects/mmtk-github/ruby/build-debug/ruby(invoke_iseq_block_from_c+0x20f) [0x562c8ee57473] ../vm.c:1594
/home/wks/projects/mmtk-github/ruby/build-debug/ruby(invoke_block_from_c_proc) ../vm.c:1692
/home/wks/projects/mmtk-github/ruby/build-debug/ruby(vm_invoke_proc) ../vm.c:1722
/home/wks/projects/mmtk-github/ruby/build-debug/ruby(rb_vm_invoke_proc+0xa6) [0x562c8ee57c54] ../vm.c:1743
/home/wks/projects/mmtk-github/ruby/build-debug/ruby(thread_do_start_proc+0x2ae) [0x562c8eddc48f] ../thread.c:628
/home/wks/projects/mmtk-github/ruby/build-debug/ruby(thread_do_start+0x57) [0x562c8eddc500] ../thread.c:645
/home/wks/projects/mmtk-github/ruby/build-debug/ruby(thread_start_func_2+0x3d4) [0x562c8eddc93e] ../thread.c:708
/home/wks/projects/mmtk-github/ruby/build-debug/ruby(call_thread_start_func_2+0x64) [0x562c8edd359d] ../thread_pthread.c:2236
/home/wks/projects/mmtk-github/ruby/build-debug/ruby(nt_start+0x1d9) [0x562c8edd378d] ../thread_pthread.c:2281
/usr/lib/libc.so.6(0x714b8190e39d) [0x714b8190e39d]
/usr/lib/libc.so.6(0x714b8199349c) [0x714b8199349c]
wks commented 3 weeks ago

Actually it is not related to forking, but the child process executes the shutting-down sequence which triggers the error.

The bug is triggered if both of the following are true on exit

  1. there are pending finalize jobs, and
  2. the finalizer_table is empty

Here is a program that can trigger the bug (single-threaded, no forking)

puts "Hello world!"

def make_finalizer(msg)
  proc do
    puts msg
  end
end

puts "Make a finalizable garbage"
ObjectSpace.define_finalizer(Object.new, make_finalizer("Finalizing..."))

puts "Trigger a GC by allocating a large object..."
"x"*1000000000

puts "Goodbye world!"

The so-called "final jobs" are the counterpart of "zombies" in the vanilla CRuby, but they are allocated using malloc. They are created during GC. Some "final jobs" run finalizers (i.e. "finalizer jobs") and others call dfree for T_DATA instances (i.e. "dfree jobs"). They are created from entries of the finalizer_table that point to dead objects, and are enqueued onto heap_pages_deferred_final.

Currently, if a GC is triggered by GC.start, then the thread that calls GC.start will "volunteer" to be the thread to run gc_finalize_deferred. It will drain the linked list heap_pages_deferred_final. However, if a GC is triggered by allocation, it will not run any "final jobs". This is a flaw, but since there is no guarantee how soon finalizers will be executed, those finalizers will eventually be executed when the VM quits. So far so good.

However, during shutting down, the current code asserts that after calling obj_free for all objects (actually only some important types including files), there are no "final jobs" for finalizers, only those for "dfree". The problematic segment is (from rb_gc_impl_shutdown_call_finalizer):

        while (finalizer_table->num_entries) { // REPEAT UNTIL finalizer_table IS EMPTY
            st_foreach(finalizer_table, rb_mmtk_evacuate_finalizer_table_on_exit_i, 0);
            finalize_deferred_heap_pages(objspace);
        }

        rb_gc_set_obj_free_on_exit_started();
        rb_mmtk_call_obj_free_on_exit();

        finalize_deferred_heap_pages(objspace); // ASSERTION FAILURE HERE

It first drains the finalizer_table so that all finalizers are executed before any obj_free is executed. It drains the finalizer_table repeatedly because one finalizer may define another finalizer.

The bug is, if finalizer_table is empty from the beginning, it will not call finalize_deferred_heap_pages at all, assuming there is no pending "final jobs". However, we don't drain the linked list heap_pages_deferred_final if the last GC is triggered by allocation. That means there may be pending final jobs.

Why don't we run final jobs after allocation failure?

I decided not to run any final jobs after allocation failure because that would require a check after every allocation, which hurts the performance of allocation fast paths. It may be possible to run "final jobs" in the end of block_for_gc, but block_for_gc is called during allocation (mmtk::memory_manager::alloc). We never expected alloc to be reentrant in a single thread. Obviously, if we do so, the inner alloc will overwrite the bump pointer of the outer alloc. So for simplicity, I simply omitted draining heap_pages_deferred_final after allocation.

What does CRuby's default GC do?

It sets an interrupt trigger. The call chain is: newobj_slowpath -> newobj_alloc -> newobj_cache_miss -> heap_next_free_page -> heap_prepare -> gc_continue -> gc_sweep_continue -> gc_sweep_rest -> gc_sweep_step -> gc_sweep_page -> gc_finalize_deferred_register -> rb_postponed_job_trigger(objspace->finalize_deferred_pjob)

The interrupt is checked (RUBY_VM_CHECK_INTS) in many places in the VM, and will execute finalize_deferred_pjob which is bound to gc_finalize_deferred. One such place is vm_pop_frame which is called when a function returns.

Solution?

When using MMTk, we can also call rb_postponed_job_trigger(objspace->finalize_deferred_pjob) to schedule the execution of "final jobs".

But I think this still doesn't guarantee "final jobs" will be executed before exiting. So in rb_gc_impl_shutdown_call_finalizer, we should call finalize_deferred_heap_pages unconditionally at least once to ensure pending final jobs on heap_pages_deferred_final are drained regardless whether the finalizer_table is empty or not.

wks commented 3 weeks ago

Fixed in https://github.com/mmtk/ruby/commit/1c4102e7bcee03f66ca72f916cde0d2bd7a8638b (force pushed to fix a typo)

I think things will still go wrong if one thread forks while another thread is running finalizers. The child process will see that finalization is still in progress (objspace->atomic_flags.finalizing == 1) and skip the finalization on exit, and may result in files not flushed on exit, and, more seriously, seeing some finalizers halfway executed. As long as tests are passing, I will not prioritize fixing that because it is an error to fork while having multiple threads.