Closed eileencodes closed 1 month ago
In the past, I thought a non-embedded string must have a non-NULL strbuf. But that may be a mistake.
The old documentation said that STR_SHARED_ROOT
implies RSTRING_NO_EMBED
, but https://github.com/ruby/ruby/commit/c6b391214c13aa89bddad8aa2ba334fab98ab03c updated the documentation and it no longer says so.
So the policy of the RSTRING_EXT(str)->strbuf
field is that it is non-null if and only if RSTRING(str)->as.heap.ptr
points into an imemo_strbuf instance. Specifically,
strbuf
field does not exist (used for string payload)strbuf
must be non-NULL.SHARED_ROOT
and it is not embedded, strbuf
must be non-NULL. If it is embedded, there is no strbuf
field.SHARED
, strbuf
is non-NULL if the shared root is not embedded.So the only case where RSTRING_ext(str)->strbuf
can be NULL is that str
is a shared string, and the shared root is embedded.
marking 0x20100cb1780 <===== this is shared done 0x20100cb1780 old_strbuf: 0x0 new_strbuf: 0x0 <===== not only does shared not move, it doesn't have a strbuf
This shouldn't happen. If the non-embedded string is not shared, or if it is shared but the shared root is not embedded, then it must have a non-null strbuf
. This must be a bug.
By the way, by default Immix does not move all objects. It only does "opportunistic" defragmentation, i.e. (1) it only does defragmentation if the heap is very full, and (2) even when doing defragmentation, it only moves the most fragmented blocks, and stop defragmenting if it has moved 2% of the full heap size.
However, you can override this behavior and force Immix to do defragmentation in every GC and move as many objects as possible by hacking mmtk-core itself. See the instructions in https://github.com/mmtk/mmtk-core/blob/160b7702fccda133c9407234821ad35103623179/src/policy/immix/mod.rs#L20-L43
@wks this all makes sense. So indeed we're seeing a bug. @eileencodes it sounds like we should be able to add that assertion in the "update references" code. IOW, when we're updating references, if the string isn't embedded or it is pointing to a non-embedded root, it must have a reference to a tmpbuf reference.
My guess is that somewhere we're making shared strings that don't have the tmpbuf set. Adding assertions like this should help us track it down.
I found a special case. If a string is created using str_new_static
, it will not have an RSTRING_EXT(str)->strbuf
because the string contents are not in the MMTk heap. It's not even in the malloc heap. Such string instances have the STR_NOFREE
flag.
And if a string is shared, and the shared root has STR_NOFREE
, the shared string will also not have RSTRING_EXT(str)->strbuf
, either.
I've been debugging a panic that I can easily reproduce on my macos machine. I've whittled the reproduction down and what appears to be happening is that in
rb_mmtk_gc_ref_update_string
the original string is moved but shared string object stays put. In this scenario we've noticed that while the original object has it's underlying buffer the shared object entersrb_mmtk_gc_ref_update_string
with no buffer.After some more debugging and testing we also found that in other cases (unrelated to this particular test) that very often strings are missing their underlying buffer. So we're wondering, if a string is not embedded should it always have an underlying buffer? IE should this assertion always pass:
We're seeing this not true but aren't sure that the string should always have an underlying buffer (if it's not embedded).
Here's the test I've been running. Note I set the max heap to 1gb because it reproduces more often with a smaller amount of memory. This also only happens on Immix because the object is moving.
You can use this branch which has the test stripped down to the minimum https://github.com/mmtk/ruby/compare/mmtk...eileencodes:ruby:reproduction-for-str-size?expand=1
The assertion that's failing is here which expects the difference between
orig
andshared
to be equal to or greater than0
but instead we have a negative number (becauseorig
moved andshared
didn't follow:https://github.com/mmtk/ruby/blob/8082532b9f720a1e7508ac32641dcfbbf51dd518/string.c#L1775-L1783
Here's some of the stacktrace with the debugging info (notes added by me):
cc/ @tenderlove @wks