mmtk / ruby

Fork of The Ruby Programming Language [mirror], with added support for MMTk
https://www.ruby-lang.org/
Other
0 stars 1 forks source link

Ruby assertion causing panic with shared strings #85

Closed eileencodes closed 1 month ago

eileencodes commented 2 months ago

I've been debugging a panic that I can easily reproduce on my macos machine. I've whittled the reproduction down and what appears to be happening is that in rb_mmtk_gc_ref_update_string the original string is moved but shared string object stays put. In this scenario we've noticed that while the original object has it's underlying buffer the shared object enters rb_mmtk_gc_ref_update_string with no buffer.

After some more debugging and testing we also found that in other cases (unrelated to this particular test) that very often strings are missing their underlying buffer. So we're wondering, if a string is not embedded should it always have an underlying buffer? IE should this assertion always pass:

GC_ASSERT(RSTRING_EXT(str)->strbuf);

We're seeing this not true but aren't sure that the string should always have an underlying buffer (if it's not embedded).

Here's the test I've been running. Note I set the max heap to 1gb because it reproduces more often with a smaller amount of memory. This also only happens on Immix because the object is moving.

You can use this branch which has the test stripped down to the minimum https://github.com/mmtk/ruby/compare/mmtk...eileencodes:ruby:reproduction-for-str-size?expand=1

make test-all RUBY_TESTOPTS="--excludes-dir=test/.excludes-mmtk --seed=39446 -v" RUN_OPTS="--mmtk --mmtk-plan=Immix --mmtk-max-heap=1GiB" RUST_BACKTRACE=1 RUST_LOG=none TESTS="test/rubygems/test_gem_remote_fetcher.rb"

The assertion that's failing is here which expects the difference between orig and shared to be equal to or greater than 0 but instead we have a negative number (because orig moved and shared didn't follow:

https://github.com/mmtk/ruby/blob/8082532b9f720a1e7508ac32641dcfbbf51dd518/string.c#L1775-L1783

Here's some of the stacktrace with the debugging info (notes added by me):

marking 0x20100cb1780
done 0x20100cb1780 old_strbuf: 0x20100cb1770 new_strbuf: 0x20100cb1770
[1/2] TestGemRemoteFetcher#test_implicit_upper_case_proxy = 0.01 s
marking 0x201007af028
done 0x201007af028 old_strbuf: 0x201007af018 new_strbuf: 0x201007af018
marking 0x600ffc05018
done 0x600ffc05018 old_strbuf: 0x600ffc05008 new_strbuf: 0x600ffc05008
marking 0x20100cb1780 <==== this is orig
done 0x20100c61c08 old_strbuf: 0x20100cb1770 new_strbuf: 0x20100c61bf8 <==== underlying strbuf moves for orig
marking 0x20100cb1780 <===== this is shared
done 0x20100cb1780 old_strbuf: 0x0 new_strbuf: 0x0 <===== not only does shared not move, it doesn't have a strbuf
[2/2] TestGemRemoteFetcher#test_implicit_proxystrbuf_ptr 0x20100c61c08, orig_ptr 0x20100c61c08, shared_ptr 0x20100cb1780
str addr orig: 0x20100c61c08 shared: 0x20100cb1780 orig_strbuf 0x20100c61bf8 shared_strbuf 0x0
Assertion Failed: string.c:1805:str_new_frozen_buffer:ofs >= 0
ruby 3.4.0dev (2024-08-09T19:42:38Z reproduction-for-s.. a5b29b5277) +MMTk(Immix) [arm64-darwin23]

-- Crash Report log information --------------------------------------------
   See Crash Report log file in one of the following locations:             
     * ~/Library/Logs/DiagnosticReports                                     
     * /Library/Logs/DiagnosticReports                                      
   for more details.                                                        
Don't forget to include the above Crash Report log file in bug reports.     

-- Control frame information -----------------------------------------------
c:0030 p:---- s:0172 e:000171 CFUNC  :match
c:0029 p:---- s:0169 e:000168 CFUNC  :match
c:0028 p:0010 s:0164 e:000163 METHOD /Users/eileencodes/src/github.com/open_source/ruby/test/rubygems/test_gem_remote_fetcher.rb:61
c:0027 p:0037 s:0160 e:000159 METHOD /Users/eileencodes/src/github.com/open_source/ruby/tool/lib/test/unit/testcase.rb:202
c:0026 p:0052 s:0154 E:001088 METHOD /Users/eileencodes/src/github.com/open_source/ruby/tool/lib/test/unit/testcase.rb:170
c:0025 p:0083 s:0145 e:000144 BLOCK  /Users/eileencodes/src/github.com/open_source/ruby/tool/lib/test/unit.rb:1756 [FINISH]
c:0024 p:---- s:0138 e:000137 CFUNC  :map
c:0023 p:0103 s:0134 e:000133 METHOD /Users/eileencodes/src/github.com/open_source/ruby/tool/lib/test/unit.rb:1743
c:0022 p:0035 s:0122 e:000121 METHOD /Users/eileencodes/src/github.com/open_source/ruby/tool/lib/test/unit.rb:1360
c:0021 p:0010 s:0115 e:000113 BLOCK  /Users/eileencodes/src/github.com/open_source/ruby/tool/lib/test/unit.rb:821
c:0020 p:0027 s:0109 e:000108 METHOD <internal:array>:54
c:0019 p:0047 s:0103 e:000102 METHOD /Users/eileencodes/src/github.com/open_source/ruby/tool/lib/test/unit.rb:819
c:0018 p:0008 s:0096 e:000095 METHOD /Users/eileencodes/src/github.com/open_source/ruby/tool/lib/test/unit.rb:856
c:0017 p:0127 s:0089 e:000088 METHOD /Users/eileencodes/src/github.com/open_source/ruby/tool/lib/test/unit.rb:1689
c:0016 p:0016 s:0076 e:000075 METHOD /Users/eileencodes/src/github.com/open_source/ruby/tool/lib/test/unit.rb:1297
c:0015 p:0005 s:0071 e:000070 METHOD /Users/eileencodes/src/github.com/open_source/ruby/tool/lib/test/unit.rb:1854
c:0014 p:0006 s:0067 e:000066 BLOCK  /Users/eileencodes/src/github.com/open_source/ruby/tool/lib/test/unit.rb:1841
c:0013 p:0027 s:0063 e:000062 METHOD <internal:array>:54
c:0012 p:0042 s:0057 e:000056 METHOD /Users/eileencodes/src/github.com/open_source/ruby/tool/lib/test/unit.rb:1840
c:0011 p:0013 s:0052 e:000051 METHOD /Users/eileencodes/src/github.com/open_source/ruby/tool/lib/test/unit.rb:1884
c:0010 p:0008 s:0047 e:000046 METHOD /Users/eileencodes/src/github.com/open_source/ruby/tool/lib/test/unit.rb:1044
c:0009 p:0008 s:0041 e:000040 METHOD /Users/eileencodes/src/github.com/open_source/ruby/tool/lib/test/unit.rb:889
c:0008 p:0017 s:0035 e:000034 METHOD /Users/eileencodes/src/github.com/open_source/ruby/tool/lib/test/unit.rb:145
c:0007 p:0008 s:0030 e:000029 METHOD /Users/eileencodes/src/github.com/open_source/ruby/tool/lib/test/unit.rb:1281
c:0006 p:0032 s:0025 e:000024 METHOD /Users/eileencodes/src/github.com/open_source/ruby/tool/lib/test/unit.rb:1967
c:0005 p:0009 s:0021 e:000020 METHOD /Users/eileencodes/src/github.com/open_source/ruby/tool/lib/test/unit.rb:1971
c:0004 p:0065 s:0016 e:000014 TOP    /Users/eileencodes/src/github.com/open_source/ruby/tool/test/runner.rb:14 [FINISH]
c:0003 p:---- s:0011 e:000010 CFUNC  :require_relative
c:0002 p:0005 s:0006 e:000005 EVAL   ./test/runner.rb:5 [FINISH]
c:0001 p:0000 s:0003 E:001d10 DUMMY  [FINISH]

-- Ruby level backtrace information ----------------------------------------
./test/runner.rb:5:in '<main>'
./test/runner.rb:5:in 'require_relative'
/Users/eileencodes/src/github.com/open_source/ruby/tool/test/runner.rb:14:in '<top (required)>'
/Users/eileencodes/src/github.com/open_source/ruby/tool/lib/test/unit.rb:1971:in 'run'
/Users/eileencodes/src/github.com/open_source/ruby/tool/lib/test/unit.rb:1967:in 'run'
/Users/eileencodes/src/github.com/open_source/ruby/tool/lib/test/unit.rb:1281:in 'run'
/Users/eileencodes/src/github.com/open_source/ruby/tool/lib/test/unit.rb:145:in 'run'
/Users/eileencodes/src/github.com/open_source/ruby/tool/lib/test/unit.rb:889:in 'run'
/Users/eileencodes/src/github.com/open_source/ruby/tool/lib/test/unit.rb:1044:in 'run'
/Users/eileencodes/src/github.com/open_source/ruby/tool/lib/test/unit.rb:1884:in 'run'
/Users/eileencodes/src/github.com/open_source/ruby/tool/lib/test/unit.rb:1840:in '_run'
<internal:array>:54:in 'each'
/Users/eileencodes/src/github.com/open_source/ruby/tool/lib/test/unit.rb:1841:in 'block in _run'
/Users/eileencodes/src/github.com/open_source/ruby/tool/lib/test/unit.rb:1854:in 'run_tests'
/Users/eileencodes/src/github.com/open_source/ruby/tool/lib/test/unit.rb:1297:in '_run_anything'
/Users/eileencodes/src/github.com/open_source/ruby/tool/lib/test/unit.rb:1689:in '_run_anything'
/Users/eileencodes/src/github.com/open_source/ruby/tool/lib/test/unit.rb:856:in '_run_suites'
/Users/eileencodes/src/github.com/open_source/ruby/tool/lib/test/unit.rb:819:in '_run_suites'
<internal:array>:54:in 'each'
/Users/eileencodes/src/github.com/open_source/ruby/tool/lib/test/unit.rb:821:in 'block in _run_suites'
/Users/eileencodes/src/github.com/open_source/ruby/tool/lib/test/unit.rb:1360:in '_run_suite'
/Users/eileencodes/src/github.com/open_source/ruby/tool/lib/test/unit.rb:1743:in '_run_suite'
/Users/eileencodes/src/github.com/open_source/ruby/tool/lib/test/unit.rb:1743:in 'map'
/Users/eileencodes/src/github.com/open_source/ruby/tool/lib/test/unit.rb:1756:in 'block in _run_suite'
/Users/eileencodes/src/github.com/open_source/ruby/tool/lib/test/unit/testcase.rb:170:in 'run'
/Users/eileencodes/src/github.com/open_source/ruby/tool/lib/test/unit/testcase.rb:202:in 'run_test'
/Users/eileencodes/src/github.com/open_source/ruby/test/rubygems/test_gem_remote_fetcher.rb:61:in 'test_implicit_proxy'
/Users/eileencodes/src/github.com/open_source/ruby/test/rubygems/test_gem_remote_fetcher.rb:61:in 'match'
/Users/eileencodes/src/github.com/open_source/ruby/test/rubygems/test_gem_remote_fetcher.rb:61:in 'match'
Screenshot 2024-08-09 at 3 53 08 PM

cc/ @tenderlove @wks

wks commented 2 months ago

In the past, I thought a non-embedded string must have a non-NULL strbuf. But that may be a mistake.

The old documentation said that STR_SHARED_ROOT implies RSTRING_NO_EMBED, but https://github.com/ruby/ruby/commit/c6b391214c13aa89bddad8aa2ba334fab98ab03c updated the documentation and it no longer says so.

So the policy of the RSTRING_EXT(str)->strbuf field is that it is non-null if and only if RSTRING(str)->as.heap.ptr points into an imemo_strbuf instance. Specifically,

So the only case where RSTRING_ext(str)->strbuf can be NULL is that str is a shared string, and the shared root is embedded.

wks commented 2 months ago

marking 0x20100cb1780 <===== this is shared done 0x20100cb1780 old_strbuf: 0x0 new_strbuf: 0x0 <===== not only does shared not move, it doesn't have a strbuf

This shouldn't happen. If the non-embedded string is not shared, or if it is shared but the shared root is not embedded, then it must have a non-null strbuf. This must be a bug.

wks commented 2 months ago

By the way, by default Immix does not move all objects. It only does "opportunistic" defragmentation, i.e. (1) it only does defragmentation if the heap is very full, and (2) even when doing defragmentation, it only moves the most fragmented blocks, and stop defragmenting if it has moved 2% of the full heap size.

However, you can override this behavior and force Immix to do defragmentation in every GC and move as many objects as possible by hacking mmtk-core itself. See the instructions in https://github.com/mmtk/mmtk-core/blob/160b7702fccda133c9407234821ad35103623179/src/policy/immix/mod.rs#L20-L43

tenderlove commented 2 months ago

@wks this all makes sense. So indeed we're seeing a bug. @eileencodes it sounds like we should be able to add that assertion in the "update references" code. IOW, when we're updating references, if the string isn't embedded or it is pointing to a non-embedded root, it must have a reference to a tmpbuf reference.

My guess is that somewhere we're making shared strings that don't have the tmpbuf set. Adding assertions like this should help us track it down.

wks commented 1 month ago

I found a special case. If a string is created using str_new_static, it will not have an RSTRING_EXT(str)->strbuf because the string contents are not in the MMTk heap. It's not even in the malloc heap. Such string instances have the STR_NOFREE flag.

And if a string is shared, and the shared root has STR_NOFREE, the shared string will also not have RSTRING_EXT(str)->strbuf, either.