python-jsonschema / check-jsonschema

A CLI and set of pre-commit hooks for jsonschema validation with built-in support for GitHub Workflows, Renovate, Azure Pipelines, and more!
https://check-jsonschema.readthedocs.io/en/stable
Other
217 stars 40 forks source link

Support ECMAScript unicode-mode RegExp usage for 'pattern' and 'patternProperties' #353

Open djgoku opened 1 year ago

djgoku commented 1 year ago

Instead of duplicating everything here is my problem.

If you want me to copy anything for this issue I can.

https://github.com/awslabs/amazon-ecs-intellisense-schema/pull/8

I am looking to see how I can support Python Unicode regex so I can use this json schema.

sirosen commented 1 year ago

If I have understood the issue correctly, this is a matter of a schema using an ECMA regex syntax which python (the language) does not support.

However, luckily, we're using regress here to provide ECMA-compatible regex support for format. Which means that I need to know what type of match is failing, and then with through instrumenting the right support.

djgoku commented 1 year ago

Thanks for looking into this. I am opening to helping if I can. I'll try to check out the errors again tomorrow.

sirosen commented 11 months ago

To put all the information for this in one place (with some fun emojis):

Rojax commented 4 weeks ago

I'm having a similar issue described here: https://github.com/usnistgov/metaschema/issues/770.

In short: Having this

"pattern": "^(\\p{L}|_)(\\p{L}|\\p{N}|[.\\-_])*$" 

results in this

Error: schemafile was not valid: '^(\\p{L}|_)(\\p{L}|\\p{N}|[.\\-_])*$' is not a 'regex'
Failed validating 'format' in metaschema['properties']['definitions']['additionalProperties']['properties']['pattern']:
    {'type': 'string', 'format': 'regex'}
On schema['definitions']['TokenDatatype']['pattern']:
    '^(\\p{L}|_)(\\p{L}|\\p{N}|[.\\-_])*$'
SchemaError: '^(\\p{L}|_)(\\p{L}|\\p{N}|[.\\-_])*$' is not a 'regex'
Failed validating 'format' in metaschema['properties']['definitions']['additionalProperties']['properties']['pattern']:
    {'type': 'string', 'format': 'regex'}
On schema['definitions']['TokenDatatype']['pattern']:
    '^(\\p{L}|_)(\\p{L}|\\p{N}|[.\\-_])*$'
sirosen commented 3 weeks ago

I took some time today to do the internal restructuring I've been meaning to do, in order to make this possible. It's most of the way there, but I've hit a bit of a strange case with custom validators which I need to sort out. And I still need to put together a good test case to verify my new work.

I think this will not work with arbitrary custom validators, at least for the initial version. In order to attach the alternate pattern validation to a validator class, I'm using the extend API. In theory, a custom validator class could have changes which that API will not preserve. I need to work out how to document this, since it's subtle.

And the other notable thing here is that this is a change to pattern but not patternProperties, at least at the moment. The two are different and each requires it's own implementation, though they can share some bits.