jhawthorn / vernier

📏 next generation CRuby profiler
https://vernier.prof/
MIT License
837 stars 20 forks source link

[BUG] `objspace/memsize_of(): unknown data type 0x0(0x00007bc7c2488450)` #100

Closed casperisfine closed 3 weeks ago

casperisfine commented 1 month ago

We started seeing crashes on our nightly CI since last friday night.

The first commit to fail was https://github.com/ruby/ruby/commit/cb39283cbfcfeb920fff66d3ef77e4ed7f2f8d93 (but could be a close parent).

/tmp/bundle/ruby/3.4.0+0/gems/vernier-1.2.1/lib/vernier/collector.rb:75: [BUG] objspace/memsize_of(): unknown data type 0x0(0x00007bc7c2488450)
ruby 3.4.0dev (2024-10-17T23:02:36Z shopify cb39283cbf) +MN +PRISM [x86_64-linux]
-- Control frame information -----------------------------------------------
c:0054 p:---- s:0347 e:000346 CFUNC  :finish
c:0053 p:0003 s:0343 e:000342 METHOD /tmp/bundle/ruby/3.4.0+0/gems/vernier-1.2.1/lib/vernier/collector.rb:75
c:0052 p:0020 s:0335 e:000334 METHOD /tmp/bundle/ruby/3.4.0+0/gems/app_profiler-0.2.5/lib/app_profiler/backend/vernier_backend.rb:64
c:0051 p:0031 s:0331 e:000330 METHOD /tmp/bundle/ruby/3.4.0+0/gems/app_profiler-0.2.5/lib/app_profiler/backend/vernier_backend.rb:31
....
-- Ruby level backtrace information ----------------------------------------
....
/app/components/platform/essentials/lib/middleware/profiling_middleware.rb:79:in 'call'
/tmp/bundle/ruby/3.4.0+0/gems/app_profiler-0.2.5/lib/app_profiler/middleware.rb:19:in 'call'
/tmp/bundle/ruby/3.4.0+0/gems/app_profiler-0.2.5/lib/app_profiler/middleware.rb:35:in 'profile'
/tmp/bundle/ruby/3.4.0+0/gems/app_profiler-0.2.5/lib/app_profiler.rb:75:in 'run'
/tmp/bundle/ruby/3.4.0+0/gems/app_profiler-0.2.5/lib/app_profiler/backend/vernier_backend.rb:31:in 'run'
/tmp/bundle/ruby/3.4.0+0/gems/app_profiler-0.2.5/lib/app_profiler/backend/vernier_backend.rb:64:in 'stop'
/tmp/bundle/ruby/3.4.0+0/gems/vernier-1.2.1/lib/vernier/collector.rb:75:in 'stop'
/tmp/bundle/ruby/3.4.0+0/gems/vernier-1.2.1/lib/vernier/collector.rb:75:in 'finish'
-- Threading information ---------------------------------------------------
Total ractor count: 1
Ruby thread count for this ractor: 10
-- C level backtrace information -------------------------------------------
/usr/local/ruby/bin/ruby(rb_print_backtrace+0x14) [0x598fc25f4a41] /tmp/ruby-build/ruby-3.4.0-cb39283cbfcfeb920fff66d3ef77e4ed7f2f8d93/vm_dump.c:822
/usr/local/ruby/bin/ruby(rb_vm_bugreport) /tmp/ruby-build/ruby-3.4.0-cb39283cbfcfeb920fff66d3ef77e4ed7f2f8d93/vm_dump.c:1151
/usr/local/ruby/bin/ruby(bug_report_end+0x0) [0x598fc25ac586] /tmp/ruby-build/ruby-3.4.0-cb39283cbfcfeb920fff66d3ef77e4ed7f2f8d93/error.c:1084
/usr/local/ruby/bin/ruby(rb_bug_without_die) /tmp/ruby-build/ruby-3.4.0-cb39283cbfcfeb920fff66d3ef77e4ed7f2f8d93/error.c:1084
/usr/local/ruby/bin/ruby(die+0x0) [0x598fc21cd1b1] /tmp/ruby-build/ruby-3.4.0-cb39283cbfcfeb920fff66d3ef77e4ed7f2f8d93/error.c:1092
/usr/local/ruby/bin/ruby(rb_bug) /tmp/ruby-build/ruby-3.4.0-cb39283cbfcfeb920fff66d3ef77e4ed7f2f8d93/error.c:1094
/usr/local/ruby/bin/ruby(obj_type_name+0x0) [0x598fc21bd48c] /tmp/ruby-build/ruby-3.4.0-cb39283cbfcfeb920fff66d3ef77e4ed7f2f8d93/gc.c:1870
/usr/local/ruby/bin/ruby(obj_info) /tmp/ruby-build/ruby-3.4.0-cb39283cbfcfeb920fff66d3ef77e4ed7f2f8d93/gc.c:4016
/usr/local/ruby/bin/ruby(rb_obj_info) /tmp/ruby-build/ruby-3.4.0-cb39283cbfcfeb920fff66d3ef77e4ed7f2f8d93/gc.c:4325
/usr/local/ruby/bin/ruby(gc_move) gc/default.c:6950
/tmp/bundle/ruby/3.4.0+0/gems/vernier-1.2.1/lib/vernier/vernier.so(RB_INT2FIX+0x0) [0x7bc7fb9fc688] /tmp/bundle/ruby/3.4.0+0/gems/vernier-1.2.1/ext/vernier/vernier.cc:1532
/tmp/bundle/ruby/3.4.0+0/gems/vernier-1.2.1/lib/vernier/vernier.so(rb_int2num_inline) /usr/local/ruby/include/ruby-3.4.0+0/ruby/internal/arithmetic/int.h:242
/tmp/bundle/ruby/3.4.0+0/gems/vernier-1.2.1/lib/vernier/vernier.so(_ZN17RetainedCollector22build_collector_resultEv) (null):0
/tmp/bundle/ruby/3.4.0+0/gems/vernier-1.2.1/lib/vernier/vernier.so(0x7bc7fb9fd9d1) [0x7bc7fb9fd9d1]
/tmp/bundle/ruby/3.4.0+0/gems/vernier-1.2.1/lib/vernier/vernier.so(collector_stop+0x16) [0x7bc7fb9f6941] /tmp/bundle/ruby/3.4.0+0/gems/vernier-1.2.1/ext/vernier/vernier.cc:2002
/usr/local/ruby/bin/ruby(vm_call_cfunc_with_frame_+0x117) [0x598fc238f847] /tmp/ruby-build/ruby-3.4.0-cb39283cbfcfeb920fff66d3ef77e4ed7f2f8d93/vm_insnhelper.c:3794
/usr/local/ruby/bin/ruby(vm_sendish+0xfa) [0x598fc23a45a3] /tmp/ruby-build/ruby-3.4.0-cb39283cbfcfeb920fff66d3ef77e4ed7f2f8d93/vm_insnhelper.c:5961
/usr/local/ruby/bin/ruby(vm_exec_core) /tmp/ruby-build/ruby-3.4.0-cb39283cbfcfeb920fff66d3ef77e4ed7f2f8d93/insns.def:898
/usr/local/ruby/bin/ruby(vm_exec_loop+0xa) [0x598fc2393fd7] /tmp/ruby-build/ruby-3.4.0-cb39283cbfcfeb920fff66d3ef77e4ed7f2f8d93/vm.c:2591
/usr/local/ruby/bin/ruby(rb_vm_exec) /tmp/ruby-build/ruby-3.4.0-cb39283cbfcfeb920fff66d3ef77e4ed7f2f8d93/vm.c:2570
/usr/local/ruby/bin/ruby(rb_yield_values2+0x56) [0x598fc23986b6] /tmp/ruby-build/ruby-3.4.0-cb39283cbfcfeb920fff66d3ef77e4ed7f2f8d93/vm.c:1646
/usr/local/ruby/bin/ruby(each_pair_i_fast+0x37) [0x598fc21fef77] /tmp/ruby-build/ruby-3.4.0-cb39283cbfcfeb920fff66d3ef77e4ed7f2f8d93/hash.c:3102
/usr/local/ruby/bin/ruby(hash_iter_status_check+0x0) [0x598fc22035e0] /tmp/ruby-build/ruby-3.4.0-cb39283cbfcfeb920fff66d3ef77e4ed7f2f8d93/hash.c:1302
/usr/local/ruby/bin/ruby(hash_ar_foreach_iter) /tmp/ruby-build/ruby-3.4.0-cb39283cbfcfeb920fff66d3ef77e4ed7f2f8d93/hash.c:1305
/usr/local/ruby/bin/ruby(ar_foreach_check) /tmp/ruby-build/ruby-3.4.0-cb39283cbfcfeb920fff66d3ef77e4ed7f2f8d93/hash.c:944
/usr/local/ruby/bin/ruby(hash_foreach_call) /tmp/ruby-build/ruby-3.4.0-cb39283cbfcfeb920fff66d3ef77e4ed7f2f8d93/hash.c:1435
/usr/local/ruby/bin/ruby(rb_hash_foreach+0x55) [0x598fc2209d15] /tmp/ruby-build/ruby-3.4.0-cb39283cbfcfeb920fff66d3ef77e4ed7f2f8d93/hash.c:1459
/usr/local/ruby/bin/ruby(rb_hash_each_pair+0x47) [0x598fc220b3b7] /tmp/ruby-build/ruby-3.4.0-cb39283cbfcfeb920fff66d3ef77e4ed7f2f8d93/hash.c:3137
/usr/local/ruby/bin/ruby(vm_call_cfunc_with_frame_+0x117) [0x598fc238f847] /tmp/ruby-build/ruby-3.4.0-cb39283cbfcfeb920fff66d3ef77e4ed7f2f8d93/vm_insnhelper.c:3794
/usr/local/ruby/bin/ruby(vm_sendish+0x10a) [0x598fc2387f9a] /tmp/ruby-build/ruby-3.4.0-cb39283cbfcfeb920fff66d3ef77e4ed7f2f8d93/vm_insnhelper.c:5961
/usr/local/ruby/bin/ruby(vm_exec_core+0x2588) [0x598fc23a69b8] /tmp/ruby-build/ruby-3.4.0-cb39283cbfcfeb920fff66d3ef77e4ed7f2f8d93/insns.def:851
/usr/local/ruby/bin/ruby(vm_exec_loop+0xa) [0x598fc2393fd7] /tmp/ruby-build/ruby-3.4.0-cb39283cbfcfeb920fff66d3ef77e4ed7f2f8d93/vm.c:2591
/usr/local/ruby/bin/ruby(rb_vm_exec) /tmp/ruby-build/ruby-3.4.0-cb39283cbfcfeb920fff66d3ef77e4ed7f2f8d93/vm.c:2570
/usr/local/ruby/bin/ruby(rb_vm_invoke_proc+0x5f) [0x598fc2394caf] /tmp/ruby-build/ruby-3.4.0-cb39283cbfcfeb920fff66d3ef77e4ed7f2f8d93/vm.c:1740
/usr/local/ruby/bin/ruby(rb_proc_call_kw+0x2a) [0x598fc2288b3e] /tmp/ruby-build/ruby-3.4.0-cb39283cbfcfeb920fff66d3ef77e4ed7f2f8d93/proc.c:962
/usr/local/ruby/bin/ruby(rb_proc_call) /tmp/ruby-build/ruby-3.4.0-cb39283cbfcfeb920fff66d3ef77e4ed7f2f8d93/proc.c:972
/usr/local/ruby/bin/ruby(exec_end_procs_chain+0x44) [0x598fc21d1ffd] /tmp/ruby-build/ruby-3.4.0-cb39283cbfcfeb920fff66d3ef77e4ed7f2f8d93/eval_jump.c:105
/usr/local/ruby/bin/ruby(rb_ec_exec_end_proc) /tmp/ruby-build/ruby-3.4.0-cb39283cbfcfeb920fff66d3ef77e4ed7f2f8d93/eval_jump.c:121
/usr/local/ruby/bin/ruby(rb_ec_teardown+0xc7) [0x598fc21d21f7] /tmp/ruby-build/ruby-3.4.0-cb39283cbfcfeb920fff66d3ef77e4ed7f2f8d93/eval.c:155
/usr/local/ruby/bin/ruby(rb_ec_cleanup+0x140) [0x598fc21d2400] /tmp/ruby-build/ruby-3.4.0-cb39283cbfcfeb920fff66d3ef77e4ed7f2f8d93/eval.c:207
/usr/local/ruby/bin/ruby(rb_main+0x21) [0x598fc21cd897] ./main.c:43
/usr/local/ruby/bin/ruby(main) ./main.c:62
/lib/x86_64-linux-gnu/libc.so.6(0x7bc848453d90) [0x7bc848453d90]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80) [0x7bc848453e40]
[0x598fc21cd8e5]

I'll see if I can figure out what it is, but filing it here just in case.

casperisfine commented 1 month ago

Alright, I'm able to reproduce it locally with GC.auto_compact = true.

casperisfine commented 1 month ago

The mark function is very fishy:

    void mark() {
        // We don't mark the objects, but we MUST mark the frames, otherwise they
        // can be garbage collected.
        // When we stop collection we will stringify the remaining frames, and then
        // clear them from the set, allowing them to be removed from out output.
        stack_table->mark_frames();
        rb_gc_mark(stack_table_value);

        rb_gc_mark(tp_newobj);
        rb_gc_mark(tp_freeobj);
    }

It says it doesn't mark the objects, but we crash because while iterating over for (auto& obj: collector->object_list) { some objects are clearly garbage.

So I suspect we should mark them so they are pinned?