jhthorsen / json-validator

:cop: Validate data against a JSON schema
https://metacpan.org/release/JSON-Validator
56 stars 57 forks source link

Wide character in subroutine entry #261

Closed xurc closed 1 year ago

xurc commented 2 years ago

Steps to reproduce the behavior

Add an array with string unique items to the specification ("uniqueItems": true)

           "test": {
             "type": "array",
             "uniqueItems": true,
...
           }

and validate non-ASCII values

          "test" => [
            "\x{422}\x{435}\x{441}\x{442}1",
            "Test1"
          ],

Expected behavior

No errors

Actual behavior

Wide character in subroutine entry at /home/crux/workspace/qq/core/lib/JSON/Validator/Util.pm line 27.

Apparently the error occurs because the Mojo::Util::md5_sum is expecting bytes here:

package JSON::Validator::Util;
...
sub data_checksum {
  return Mojo::Util::md5_sum(ref $_[0] ? $serializer->($_[0]) : defined $_[0] ? qq('$_[0]') : 'undef');
}

While stuck in a local dirty hotfix:

sub data_checksum {
  my $t = $_[0];
  ref $t ? 1 : defined $t && utf8::is_utf8($t) ? utf8::encode($t) : 1;
  return Mojo::Util::md5_sum(ref $_[0] ? $serializer->($_[0]) : defined $_[0] ? qq('$t') : 'undef');
}
jhthorsen commented 2 years ago

I’m pretty sure you have to make sure you have bytes everywhere, meaning you have to encode your data before passing it to the validator.

xurc commented 2 years ago

I’m pretty sure you have to make sure you have bytes everywhere, meaning you have to encode your data before passing it to the validator.

This is counterintuitive, since UTF comes from the database, and UTF leaves in response. Applications generally work with characters, not bytes.

In this case, bytes are required only due to the fact that inside the validator to check uniqueness, md5 is used, which needs bytes, it is not logical to convert all data due to the internal implementation in the library.

In addition, this conversion can lead to undefined behavior and significantly affect performance.

It can also break pattern-based string validation and string lengths elsewhere.

jhthorsen commented 1 year ago

I'm going to close this issue, since there haven't been any updates for a year.