Closed GuidoDipietro closed 10 months ago
When we try to serialize a string containing non-ASCII characters, instead of using the multi-byte Unicode representation only the last 8 bits of the char value are written into the buffer.
Precise code where this happens is here.
encode_string(value: unknown): void { this.checkTypes && utils.expect_type(value, 'string', this.fieldPath); const _value = value as string; // 4 bytes for length this.encoded.store_value(_value.length, 'u32'); // string bytes for (let i = 0; i < _value.length; i++) { this.encoded.store_value(_value.charCodeAt(i), 'u8'); } }
As a result the output buffer is not compatible with the standard deserialization for UTF-8 Strings.
Actually encode the character as multi-byte.
Hi @GuidoDipietro , a PR has now been issued to fix the error: see #76 .
cc @ailisp
fixed on release v2.0
v2.0
When we try to serialize a string containing non-ASCII characters, instead of using the multi-byte Unicode representation only the last 8 bits of the char value are written into the buffer.
Precise code where this happens is here.
As a result the output buffer is not compatible with the standard deserialization for UTF-8 Strings.
Solution
Actually encode the character as multi-byte.