opal / opal

Ruby ♥︎ JavaScript
https://opalrb.com
MIT License
4.84k stars 331 forks source link

Bug: String methods returning wrong results for Strings containing UTF-16 surrogates, further String issues due to JavaScript UCS2/UTF-16 mixup #2709

Open janbiedermann opened 1 week ago

janbiedermann commented 1 week ago

JavaScript/Browsers String implementation is "funky" in so far, as it can handle to display UTF-16 correctly, also do transformations correctly in UTF-16, mostly, yet some methods treat strings as UCS2. As Opal relies on String with Opal String being a bridged class, these issues leak to Ruby space. For example:

Opal:

>> 'a𝌆'[2]
=> "�"
>> 'a𝌆'[1]
=> "�"
>> 'a𝌆'.size
=> 3

Ruby:

:002 > 'a𝌆'[2]
=> nil 
:003 > 'a𝌆'[1]
=> "𝌆"
:004 > 'a𝌆'.size
=> 2

For reference:

This should be considered for #2231

janbiedermann commented 1 week ago

Issue applies to Regexp too