near / borsh-js

TypeScript/JavaScript implementation of Binary Object Representation Serializer for Hashing
Apache License 2.0
112 stars 38 forks source link

fix: ut8-string encode/decode #76

Closed gagdiez closed 10 months ago

gagdiez commented 10 months ago

There was a bug by which we were storing the strings as unicode bytes instead of utf8 bytes. This was a bug since the specification clearly says that the encoding must be utf8.

This commit fixes such bug using the TextEncode / TextDecode tools, which are widely supported by modern browsers and node versions.

gagdiez commented 10 months ago

Using TextEncode and TextDecode will not work with borsh-js... will keep checking alternatives

gagdiez commented 10 months ago

Ended up using some library as guide [1] to manually create the encoder. To make sure it works correctly, I added tests encoding/decoding characters from a wide range of the utf8 spectrum [2], and used online tools [3] to check my results against.

[1] https://github.com/mathiasbynens/utf8.js/blob/master/utf8.js [2] https://www.fileformat.info/info/charset/UTF-8/list.htm [3] https://mothereff.in/utf-8 + https://cryptii.com/pipes/integer-encoder

ailisp commented 10 months ago

Thank you! This is a very important fix.

Using TextEncode and TextDecode will not work with borsh-js... will keep checking alternatives

Do you mean they do not work in near-sdk-js? If so, we have some custom method for utf8 encode or decode, we can add on sdk-js side to support TextEncoder/TextDecoder (not full TextEncoder/TextDecoder class as browser does, but would be sufficient for this PR), but before that your solution is the best possible one.