sinclairzx81 / typebox

Json Schema Type Builder with Static Type Resolution for TypeScript
Other
4.77k stars 152 forks source link

Value.Hash doesn't work with Unicode strings #795

Closed matlin closed 5 months ago

matlin commented 5 months ago

It appears that Hash assumes that each character of a String is a single ascii byte while unicode characters (e.g. emiojis) can be larger.

Value.Hash('💣') // This will break
Screenshot 2024-03-19 at 9 31 24 AM
function String(value) {
        FNV1A64(ByteMarker.String);
        for (let i = 0; i < value.length; i++) {
            FNV1A64(value.charCodeAt(i)); // This can be larger than a byte
        }
    }
sinclairzx81 commented 5 months ago

@matlin Hi, Thanks for reporting.

This issue has been resolved in more recent versions of TypeBox. The current implementation as of 0.32.15 (current) is specified as:

Hash-NumberToBytes: https://github.com/sinclairzx81/typebox/blob/master/src/value/hash/hash.ts#L68 Hash-StringType: https://github.com/sinclairzx81/typebox/blob/master/src/value/hash/hash.ts#L115

function* NumberToBytes(value: number): IterableIterator<number> {
  const byteCount = value === 0 ? 1 : Math.ceil(Math.floor(Math.log2(value) + 1) / 8)
  for (let i = 0; i < byteCount; i++) {
    yield (value >> (8 * (byteCount - 1 - i))) & 0xff
  }
}
// ...
function StringType(value: string) {
  FNV1A64(ByteMarker.String)
  for (let i = 0; i < value.length; i++) {
    for (const byte of NumberToBytes(value.charCodeAt(i))) {
      FNV1A64(byte)
    }
  }
}

Looking back, this issue appears to have been resolved on 0.31.15 (https://github.com/sinclairzx81/typebox/pull/594) as the nearest version. If you upgrade to at least 0.31.15, things should be fixed.

Hope this helps! S

sinclairzx81 commented 5 months ago

Will close off as this issue as Unicode support for Hash has been implemented.

Cheers!

matlin commented 5 months ago

Thanks! Apologies - thought we were on the latest version!