wasmerio / wasmer-ruby

💎🕸 WebAssembly runtime for Ruby
https://wasmer.io
MIT License
471 stars 18 forks source link

Properly convert bytes to UTF-8 in examples/exports_memory.rb #61

Closed nicholaides closed 2 years ago

nicholaides commented 2 years ago

This PR updates the example in examples/exports_memory.rb to properly convert the bytes pulled from the Wasmer::Memory to UTF-8.

Reproduce

If you modify string used in the example to be "😀" instead of "Hello, World", the returned_string will be "ð\u009F\u0098\u0080".

Explanation

pack("U*") was treating them as codepoints, which happens to work with ASCII, but not with characters that use more than 1 byte.

The 😀 utf-8 character is 4 bytes long, but 1 codepoint:

"😀".bytes      # => [240, 159, 152, 128]
"😀".codepoints # => [128512]

U* is for codepoints, not bytes:

"😀".bytes.pack("U*")      # => "ð\u009F\u0098\u0080"
"😀".codepoints.pack("U*") # => "😀"

C* is for bytes:

"😀".bytes.pack("C*").force_encoding('utf-8') # => "😀"

Reference: https://ruby-doc.org/core-3.0.2/Array.html#method-i-pack

syrusakbary commented 2 years ago

Great. Thanks for the fix!