ruby / stringio

Pseudo `IO` class from/to `String`.
BSD 2-Clause "Simplified" License
33 stars 26 forks source link

Fix ascii_only? flag in strio_write #77

Closed tompng closed 8 months ago

tompng commented 8 months ago

~Fixes https://bugs.ruby-lang.org/issues/20185~ Followup of #79

Looks like rb_str_resize is changed in some ruby version

rb_str_resize(string, shorter) // clear ENC_CODERANGE in some case
rb_str_resize(string, longer) // does not clear ENC_CODERANGE anymore
// rb_str_resize in string.c
if (slen > len && ENC_CODERANGE(str) != ENC_CODERANGE_7BIT) {
  ENC_CODERANGE_CLEAR(str);
}

I think this change is based on an assumption that appending null bytes will not change flag ascii_only?.

strio_extend will make the string longer if needed, and update the flags correctly for appending null bytes. Before memmove, we need to rb_str_modify because updated flags are not updated for memmove.

kou commented 8 months ago

Wow. https://github.com/ruby/ruby/commit/b0b9f7201acab05c2a3ad92c3043a1f01df3e17f may be related.

kou commented 8 months ago

Oh, @nobu worked on this: #79

tompng commented 8 months ago

I add a test case

    s = StringIO.new("\u{3042}")
    s.rewind
    assert_not_predicate(s.string, :ascii_only?)
    s.write('aaaa')
    assert_predicate(s.string, :ascii_only?)

that fails in master (after #79)

Writing string in arbitary position(when ptr->pos != olen) can change coderange in many variation, and I think rb_str_modify is always needed.

Other variation: writing "\x00".force_encoding('utf-16be') twice can make string invalid to valid s.write "あ"; s.rewind; s.write('a'); s.string #=> "a\x81\x82" will break encoding.

kou commented 8 months ago

Thanks!