CRuby's default GC has a mechanism to convert a non-embedded Array or String back to an embedded Array or String during GC. When the GC finds the object size (i.e. the space allocated by the GC) is big enough to hold the Array or String elements, it will copy the elements from the malloc-ed off-heap buffer back into the object itself, and make the Array or String instance embedded. This capability is more important after the VWA feature was introduced, in which case an embedded object can be as large as 640 bytes, giving it enough space to hold non-trivial strings or arrays.
MMTk, in theory, has greater capability to do this kind of re-embedding. When using an evacuating collector (such as SemiSpace, GenCopy, and Immix-based collectors), MMTk allows an object to be resized when copied (in ObjectModel::copy. JikesRVM already takes advantage of this to implement array-based hashing. It will add one extra word in front of an object when copying the object to accommodate the hash code.
For Ruby, during copying GC, when copying an Array or String, we can always allocate an embedded Array or String that is big enough to hold all of its elements. (Note that MMTk has no limit in object size when allocating.) Then we can copy the elements from the imemo:mmtk_objbuf or imemo:mmtk_strbuf into the newly allocated embedded Array or String, and abandon the strbuf or objbuf.
Of course we can only do it if the Array or String is not shared, shared root, frozen or nofree (i.e. satisfying rb_ary_embeddable_p or rb_str_reembeddable_p). Even if we can, we may probably only re-embed arrays or strings up to a certain size. Otherwise it would be a waste of memory if the Array or String quickly shrinks in size soon after the re-embedding.
Related issues
https://github.com/mmtk/mmtk-ruby/issues/91#issuecomment-2378852837 mentioned a bug where the existing re-embedding code for Array is erroneously executed when using MMTk, without forwarding the members. I'll disable array re-embedding for now and re-enable it later (and do it right).
CRuby's default GC has a mechanism to convert a non-embedded Array or String back to an embedded Array or String during GC. When the GC finds the object size (i.e. the space allocated by the GC) is big enough to hold the Array or String elements, it will copy the elements from the malloc-ed off-heap buffer back into the object itself, and make the Array or String instance embedded. This capability is more important after the VWA feature was introduced, in which case an embedded object can be as large as 640 bytes, giving it enough space to hold non-trivial strings or arrays.
MMTk, in theory, has greater capability to do this kind of re-embedding. When using an evacuating collector (such as SemiSpace, GenCopy, and Immix-based collectors), MMTk allows an object to be resized when copied (in
ObjectModel::copy
. JikesRVM already takes advantage of this to implement array-based hashing. It will add one extra word in front of an object when copying the object to accommodate the hash code.For Ruby, during copying GC, when copying an Array or String, we can always allocate an embedded Array or String that is big enough to hold all of its elements. (Note that MMTk has no limit in object size when allocating.) Then we can copy the elements from the
imemo:mmtk_objbuf
orimemo:mmtk_strbuf
into the newly allocated embedded Array or String, and abandon the strbuf or objbuf.Of course we can only do it if the
Array
orString
is not shared, shared root, frozen or nofree (i.e. satisfyingrb_ary_embeddable_p
orrb_str_reembeddable_p
). Even if we can, we may probably only re-embed arrays or strings up to a certain size. Otherwise it would be a waste of memory if the Array or String quickly shrinks in size soon after the re-embedding.Related issues
https://github.com/mmtk/mmtk-ruby/issues/91#issuecomment-2378852837 mentioned a bug where the existing re-embedding code for Array is erroneously executed when using MMTk, without forwarding the members. I'll disable array re-embedding for now and re-enable it later (and do it right).