Open jemand771 opened 1 year ago
sorry if I'm getting lost in super-fine details, here's a minimal example:
import jsonschema
jsonschema.validate(
dict(
foo="aa"
),
dict(
type="object",
properties=dict(
foo=dict(
type="string",
pattern=r"a?+a"
)
)
)
)
works in python 3.11 (returns None
as expected), but crashes in 3.9. stacktrace:
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/jsonschema/_format.py", line 137, in check
result = func(instance)
File "/usr/local/lib/python3.9/site-packages/jsonschema/_format.py", line 388, in is_regex
return bool(re.compile(instance))
File "/usr/local/lib/python3.9/re.py", line 252, in compile
return _compile(pattern, flags)
File "/usr/local/lib/python3.9/re.py", line 304, in _compile
p = sre_compile.compile(pattern, flags)
File "/usr/local/lib/python3.9/sre_compile.py", line 788, in compile
p = sre_parse.parse(p, flags)
File "/usr/local/lib/python3.9/sre_parse.py", line 955, in parse
p = _parse_sub(source, state, flags & SRE_FLAG_VERBOSE, 0)
File "/usr/local/lib/python3.9/sre_parse.py", line 444, in _parse_sub
itemsappend(_parse(source, state, verbose, nested + 1,
File "/usr/local/lib/python3.9/sre_parse.py", line 672, in _parse
raise source.error("multiple repeat",
re.error: multiple repeat at position 2
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.9/site-packages/jsonschema/validators.py", line 1298, in validate
cls.check_schema(schema)
File "/usr/local/lib/python3.9/site-packages/jsonschema/validators.py", line 297, in check_schema
raise exceptions.SchemaError.create_from(error)
jsonschema.exceptions.SchemaError: 'a?+a' is not a 'regex'
Failed validating 'format' in metaschema['allOf'][1]['properties']['properties']['additionalProperties']['$dynamicRef']['allOf'][3]['properties']['pattern']:
{'format': 'regex', 'type': 'string'}
On schema['properties']['foo']['pattern']:
'a?+a'
this isn't a library issue, again, python's re
module just doesn't support possessive quantifiers before 3.11.
re.compile("a?+a")
in 3.9 gives a similar error, but regex.compile("a?+a")
works
Hi there -- today no, but in the future yes (so happy to leave this open). The spec says that implementations SHOULD use a dialect of JavaScript regexes -- which we've never been able to do because no implementation of them was available to Python, but now there is one, so yes it's definitely planned to allow you to inject your own regular expression implementation (at which point sure you'd be able to inject this one too).
Though to do so will involve implementing a protocol most likely, since the libraries all have subtly different APIs.
oh, that sounds amazing! let's just leave this open for now then, I'll look for a different short-term workaround instead. thanks :)
@Julian, I just saw this and wanted to let you know that I'd be super-interested in trying out the python-ized regress
in check-jsonschema
at some point.
Right now there's a super gross hack I put in to let some common JS regexes pass the format
check. That seems like a reasonable place to slot this in and start kicking the tires. I'll queue that up, though I'm not sure when I'll get to it.
patternProperties
is the harder thing to manage. I'm not sure how best to slot that in.
Although it would be nice to have a single slot where you give jsonschema
your regex implementation, it would be manageable to have format
remain its own piece since it's already pretty pluggable.
On that last note! For the OP:
If you construct your own format checker, you should be able to slot in a customized regex check by applying the checks
decorator to your desired method.
Here's the usage site in check-jsonschema
where I use this exact approach.
(As usual helpful comments and as usual I'm responding just to a bit to start hah, but...)
1) yes if you do try this out 1000% helpful
2) yeah exactly, really this is needed for pattern
(and patternProperties
)! So at some point indeed I think one will give such an implementation in one place and the validators will use that for all their regex matching.
More comments of course welcome!
Just a quick little update on this:
I've been using regress in the CLI for a while now for format validation and it's working great. No complaints from users and no issues across my own usages.
I have one outstanding issue which would potentially be handled by being able to use regress for pattern
. I haven't taken a look at plugging it in yet, but if it's possible today, I'd be interested in giving it a try. I'm not sure if there's a way to do it by overriding the validator for pattern
, but it seems possible in theory?
Yeah essentially swapping out the whole definition of pattern
via jsonschema.validators.extend
would be one way with no internal changes. Obviously it could be made nicer to do so.
Hi there,
this is a kind of mix between question and feature request.
I'm interested in a regex feature called possessive quantifiers. In short, these allow you to use the quantifiers
*+
,++
and?+
. These act like their counterparts (*
,+
,?
) except that if a match was found, they will not backtrack. as an example, the patterna?+a
will not match the stringa
because the regex engine doesn't backtrack "out of" the firsta?
. kind of like a super-greedy matchpython supports these starting with 3.11, but some of us are stuck on lower versions (3.9 in my case). Is there an easy way to make the
pattern
property (or the keys ofpatternProperties
) use theregex
library instead of the builtinre
? It's compatible with the builtin, but has some additional backported features like possessive quantifiers.Ideally, I'd like some kind of optional argument where I can enable the 3rd party module and make python-jsonschema use that instead of the builtin. I don't think that should be the default, as you probably don't want "useless" extra dependencies. Alternatively, patching this myself at runtime is probably possible, but it's not going to be pretty. If that's your recommendation, any hints about where to patch it? As a final option, there are ways to mimic the behavior of possessive quantifiers with existing regex features, but that's not pretty either.