Closed nicholaides closed 2 years ago
This PR updates the example in examples/exports_memory.rb to properly convert the bytes pulled from the Wasmer::Memory to UTF-8.
Wasmer::Memory
If you modify string used in the example to be "😀" instead of "Hello, World", the returned_string will be "ð\u009F\u0098\u0080".
"😀"
"Hello, World"
returned_string
"ð\u009F\u0098\u0080"
pack("U*") was treating them as codepoints, which happens to work with ASCII, but not with characters that use more than 1 byte.
pack("U*")
The 😀 utf-8 character is 4 bytes long, but 1 codepoint:
"😀".bytes # => [240, 159, 152, 128] "😀".codepoints # => [128512]
U* is for codepoints, not bytes:
U*
"😀".bytes.pack("U*") # => "ð\u009F\u0098\u0080" "😀".codepoints.pack("U*") # => "😀"
C* is for bytes:
C*
"😀".bytes.pack("C*").force_encoding('utf-8') # => "😀"
Reference: https://ruby-doc.org/core-3.0.2/Array.html#method-i-pack
Great. Thanks for the fix!
This PR updates the example in examples/exports_memory.rb to properly convert the bytes pulled from the
Wasmer::Memory
to UTF-8.Reproduce
If you modify string used in the example to be
"😀"
instead of"Hello, World"
, thereturned_string
will be"ð\u009F\u0098\u0080"
.Explanation
pack("U*")
was treating them as codepoints, which happens to work with ASCII, but not with characters that use more than 1 byte.The 😀 utf-8 character is 4 bytes long, but 1 codepoint:
U*
is for codepoints, not bytes:C*
is for bytes:Reference: https://ruby-doc.org/core-3.0.2/Array.html#method-i-pack