leadpony / justify

Justify is a JSON validator based on JSON Schema Specification and Jakarta JSON Processing API (JSON-P).
Apache License 2.0
96 stars 18 forks source link

Regexp not recognized #47

Closed marsangr closed 4 years ago

marsangr commented 4 years ago

Release 2.0 has an issue with the following pattern (you can see the whole schema under https://openid.net/schemas/verified_claims-09.json)

    "time_type": {
      "type": "string",
      "pattern": "^(?:[\\+-]?\\d{4}(?!\\d{2}\\b))(?:(-?)(?:(?:0[1-9]|1[0-2])(?:\\1(?:[12]\\d|0[1-9]|3[01]))?|W(?:[0-4]\\d|5[0-2])(?:-?[1-7])?|(?:00[1-9]|0[1-9]\\d|[12]\\d{2}|3(?:[0-5]\\d|6[1-6])))(?:[T\\s](?:(?:(?:[01]\\d|2[0-3])(?:(:?)[0-5]\\d)?|24\\:?00)(?:[\\.,]\\d+(?!:))?)?(?:\\2[0-5]\\d(?:[\\.,]\\d+)?)?(?:[zZ]|(?:[\\+-])(?:[01]\\d|2[0-3]):?(?:[0-5]\\d)?)?)?)?$"
    }

When trying to validate a doc, this is the error:

Validating the schema "./verified_claims-09.json"...
[11,368][/definitions/time_type/pattern] The value must be a valid regular expression.
At least 1 problem(s) were found in the schema "./verified_claims-09.json".
Program terminated due to schema failure.

Other tools (regex101.com, regexpal.com) recognize the regular expression as ECMA-262 correct.

leadpony commented 4 years ago

Hi @marsangr Thank you for reporting the problem. I reproduced the problem with release 2.0 , while I cannot make it happen with the current snapshot. I will examine what provides the difference.

leadpony commented 4 years ago

Your regex pattern is too complex for me, but it emits syntax error when the pattern is parsed as Unicode pattern, tested with Node.js as follows:

> new RegExp("^(?:[\\+-]?\\d{4}(?!\\d{2}\\b))(?:(-?)(?:(?:0[1-9]|1[0-2])(?:\\1(?:[12]\\d|0[1-9]|3[01]))?|W(?:[0-4]\\d|5[0-2])(?:-?[1-7])?|(?:00[1-9]|0[1-9]\\d|[12]\\d{2}|3(?:[0-5]\\d|6[1-6])))(?:[T\\s](?:(?:(?:[01]\\d|2[0-3])(?:(:?)[0-5]\\d)?|24\\:?00)(?:[\\.,]\\d+(?!:))?)?(?:\\2[0-5]\\d(?:[\\.,]\\d+)?)?(?:[zZ]|(?:[\\+-])(?:[01]\\d|2[0-3]):?(?:[0-5]\\d)?)?)?)?$", "u");
Thrown:
SyntaxError: Invalid regular expression: /^(?:[\+-]?\d{4}(?!\d{2}\b))(?:(-?)(?:(?:0[1-9]|1[0-2])(?:\1(?:[12]\d|0[1-9]|3[01]))?|W(?:[0-4]\d|5[0-2])(?:-?[1-7])?|(?:00[1-9]|0[1-9]\d|[12]\d{2}|3(?:[0-5]\d|6[1-6])))(?:[T\s](?:(?:(?:[01]\d|2[0-3])(?:(:?)[0-5]\d)?|24\:?00)(?:[\.,]\d+(?!:))?)?(?:\2[0-5]\d(?:[\.,]\d+)?)?(?:[zZ]|(?:[\+-])(?:[01]\d|2[0-3]):?(?:[0-5]\d)?)?)?)?$/: Invalid escape

Please note that u flags is specified as the second parameter of the constructor.

In Justify version 2.0 and older versions, any regex patterns are handled as Unicode pattern that is a dialect defined in the ECMAScript Language Specification. In the next release I will change it to BMP pattern, which is another dialect defined in the spec, so your pattern will be parsed as valid.

marsangr commented 4 years ago

Admittedly, it's not a trivial regexp. It is, however, the product of an SDO trying to model an ISO8601 Timestamp, so a real use case worth supporting. I'll move to the current snapshot in the meantime. Thank you very much!

leadpony commented 4 years ago

@marsangr I have released 2.1.0 into the Maven Central. You can try the new one.

marsangr commented 4 years ago

Works like a charm, thanks.