jhthorsen / json-validator

:cop: Validate data against a JSON schema
https://metacpan.org/release/JSON-Validator
56 stars 57 forks source link

Allow unicode strings in the JSON schema #268

Closed iamb closed 1 year ago

iamb commented 1 year ago

Summary

Allow JSON::Validator to validate schema with unicode strings.

Motivation

I ran into this problem with a 3rd party json schema over which I have no control.

References

This is the same issue described in #261 although I ran into it with an enum rather than uniqueItems.

The user of JSON::Validator does not have control over bytes vs. strings as given to data_checksum. When a schema comes from JSON, strings are decoded into strings of perl characters. The YAML modules decode to perl strings as well.

I had a similar fix to the one mentioned in that issue with a couple exceptions:

I did not copy the argument unless necessary because I wasn't sure if that was intentionally avoided in the original code.

I encode unconditionally the stringified scalar which is fed into the digest. This is intentional to avoid incorrectly accepted doubly-encoded UTF-8 as valid. That is demonstrated in the included tests.

I ran t/benchmark.t with n=200, n=500, and n=1000 against the previous code and this updated code and was not able to get any differences beyond noise.