near / borsh-js

TypeScript/JavaScript implementation of Binary Object Representation Serializer for Hashing
Apache License 2.0
112 stars 38 forks source link

Non-ASCII characters not being properly serialized #72

Closed GuidoDipietro closed 10 months ago

GuidoDipietro commented 11 months ago

When we try to serialize a string containing non-ASCII characters, instead of using the multi-byte Unicode representation only the last 8 bits of the char value are written into the buffer.

Precise code where this happens is here.

    encode_string(value: unknown): void {
        this.checkTypes && utils.expect_type(value, 'string', this.fieldPath);
        const _value = value as string;

        // 4 bytes for length
        this.encoded.store_value(_value.length, 'u32');

        // string bytes
        for (let i = 0; i < _value.length; i++) {
            this.encoded.store_value(_value.charCodeAt(i), 'u8');
        }
    }

As a result the output buffer is not compatible with the standard deserialization for UTF-8 Strings.

Solution

Actually encode the character as multi-byte.

gagdiez commented 10 months ago

Hi @GuidoDipietro , a PR has now been issued to fix the error: see #76 .

cc @ailisp

gagdiez commented 10 months ago

fixed on release v2.0