ruby / json

JSON implementation for Ruby
https://ruby.github.io/json
Other
704 stars 333 forks source link

New segfault in >= 2.7.3 at the intersection of bootsnap, oj, activesupport, GC.verify_compaction_references #672

Closed gmalette closed 3 weeks ago

gmalette commented 3 weeks ago

I managed to reduce our segfault to this below. It happens with JSON 2.7.3, 2.7.4, 2.7.5.

require "bundler/inline"

gemfile do
  source "https://rubygems.org"
  gem "json", "2.7.2"
  gem "oj", "3.13.23"
  gem "bootsnap"
  gem "bigdecimal"
  gem "activesupport"
end

require "bootsnap"

Bootsnap.setup(
    cache_dir:            'tmp/cache',          # Path to your cache
    ignore_directories:   [],     # Directory names to skip.
    development_mode:     true, # Current working environment, e.g. RACK_ENV, RAILS_ENV, etc
    load_path_cache:      true,                 # Optimize the LOAD_PATH with a cache
    compile_cache_iseq:   true,                 # Compile Ruby code into ISeq cache, breaks coverage reporting.
    compile_cache_yaml:   true,                 # Compile YAML into a cache
    compile_cache_json:   true,                 # Compile JSON into a cache
    readonly:             true,                 # Use the caches but don't update them on miss or stale entries.
)

require "active_support"
require "oj"
require "json"

Oj.optimize_rails

if GC.respond_to?(:verify_compaction_references)
  # This method was added in Ruby 3.0.0. Calling it this way asks the GC to
  # move objects around, helping to find object movement bugs.
  GC.verify_compaction_references(expand_heap: true, toward: :empty)
end

JSON.parse("")
byroot commented 3 weeks ago

The crash seem to be in Oj:

-- C level backtrace information -------------------------------------------
/opt/rubies/3.3.4/lib/libruby.3.3.dylib(rb_vm_bugreport+0xb4c) [0x1011c5404]
/opt/rubies/3.3.4/lib/libruby.3.3.dylib(rb_bug_for_fatal_signal+0x100) [0x101006fa4]
/opt/rubies/3.3.4/lib/libruby.3.3.dylib(sig_do_nothing+0x0) [0x10112c5ac]
/usr/lib/system/libsystem_platform.dylib(_sigtramp+0x38) [0x194083584]
/opt/rubies/3.3.4/lib/libruby.3.3.dylib(callable_method_entry_or_negative+0xc0) [0x1011a3ab8]
/opt/rubies/3.3.4/lib/libruby.3.3.dylib(gccct_method_search_slowpath+0x54) [0x1011b2354]
/opt/rubies/3.3.4/lib/libruby.3.3.dylib(rb_funcallv+0x144) [0x1011a519c]
/opt/rubies/3.3.4/lib/libruby.3.3.dylib(rb_funcall+0x84) [0x1011a88c0]
/opt/rubies/3.3.4/lib/libruby.3.3.dylib(rb_obj_as_string+0x44) [0x10113dbe8]
/opt/rubies/3.3.4/lib/libruby.3.3.dylib(ruby__sfvextra+0xe0) [0x1011325bc]
/opt/rubies/3.3.4/lib/libruby.3.3.dylib(BSD_vfprintf+0x60c) [0x1011306b4]
/opt/rubies/3.3.4/lib/libruby.3.3.dylib(ruby_vsprintf0+0xac) [0x10112fda0]
/opt/rubies/3.3.4/lib/libruby.3.3.dylib(rb_sprintf+0x5c) [0x10112ff08]
/opt/rubies/3.3.4/lib/libruby.3.3.dylib(unexpected_type+0x64) [0x1012bfb18]
/opt/rubies/3.3.4/lib/libruby.3.3.dylib(rb_unexpected_type+0x30) [0x1012bfb60]
/opt/rubies/3.3.4/lib/libruby.3.3.dylib(rb_class_superclass+0x0) [0x101099c8c]
/opt/rubies/3.3.4/lib/libruby.3.3.dylib(rb_exc_new_str+0x40) [0x1010077b8]
/opt/rubies/3.3.4/lib/libruby.3.3.dylib(rb_vraise+0x28) [0x10100abdc]
/opt/rubies/3.3.4/lib/libruby.3.3.dylib(rb_warning_category_update+0x0) [0x101005f88]
/Users/byroot/.gem/rubies/3.3.4/gems/oj-3.13.23/lib/oj/oj.bundle(oj_pi_parse+0x878) [0x11c0116d4]
/Users/byroot/.gem/rubies/3.3.4/gems/oj-3.13.23/lib/oj/oj.bundle(mimic_parse_core+0x228) [0x11c006374]
/opt/rubies/3.3.4/lib/libruby.3.3.dylib(vm_call_cfunc_with_frame_+0xf0) [0x1011b7ffc]
/opt/rubies/3.3.4/lib/libruby.3.3.dylib(vm_exec_core+0x2054) [0x10119da5c]
/opt/rubies/3.3.4/lib/libruby.3.3.dylib(rb_vm_exec+0x3d8) [0x10119aa80]
/opt/rubies/3.3.4/lib/libruby.3.3.dylib(rb_ec_exec_node+0xa0) [0x10101272c]
/opt/rubies/3.3.4/lib/libruby.3.3.dylib(ruby_run_node+0x60) [0x101012624]
gmalette commented 3 weeks ago

Interesting... using json = 2.7.2 does not exhibit a segfault though.

byroot commented 3 weeks ago

Yeah, I'm trying to get a debugger with symbols to understand what Oj is doing, but I suspect it holds onto a constant that used to be pinned and no longer is in 2.7.3.

byroot commented 3 weeks ago
frame #16: 0x000000011bb41970 oj.bundle`oj_pi_parse(argc=1, argv=0x000000016fdfc358, pi=0x000000016fdfc360, json=0x0000000000000000, len=0, yieldOk=0) at parse.c:1000:17
   997      } else if (T_STRING == rb_type(input)) {
   998          if (CompatMode == pi->options.mode) {
   999              if (No == pi->options.nilnil && 0 == RSTRING_LEN(input)) {
-> 1000                 rb_raise(oj_json_parser_error_class, "An empty string is not a valid JSON string.");
   1001             }
   1002         }
   1003         oj_pi_set_input_str(pi, &input);
    if (rb_const_defined_at(json, rb_intern("ParserError"))) {
        oj_json_parser_error_class = rb_const_get(json, rb_intern("ParserError"));
    } else {
        oj_json_parser_error_class = rb_define_class_under(json, "ParserError", json_error);
    }

So yes, Oj is not declaring its global variables correctly.

byroot commented 3 weeks ago

PR opened upstream: https://github.com/ohler55/oj/pull/941

gmalette commented 3 weeks ago

As usual, you are the very best there is.