python-jsonschema / hypothesis-jsonschema

Tools to generate test data from JSON schemata with Hypothesis
https://pypi.org/project/hypothesis-jsonschema/
Mozilla Public License 2.0
253 stars 30 forks source link

Use `greenery` and `regex_transformer` to merge `pattern` and `patternProperties` keywords #85

Open Zac-HD opened 3 years ago

Zac-HD commented 3 years ago

Currently, hypothesis-jsonschema basically just gives up if the schema has two overlapping regular expressions - at best, we'll try to randomly pick one to generate examples from, and use the other to filter. The status quo is mostly fine, but we can do better!

(Not all "python regular expressions" are truly regular in the sense of being equivalent to finite automata. Also, JSONSchema regex actually follow ECMA262 syntax, which is neither truly regular nor entirely compatible with the Python syntax. Fortunately the recommended subset is both compatible (except for Python allowing a trailing newline with $) and - with some special handling for lookahead - regular, so we'll continue our approach of handling what we can and gracefully degrading on the rest.)

This is a medium-to-large feature to develop, since regex_transformer exists but isn't packaged or particularly mature, and of unknown (neg-medium to medium) benefit. greenery might also need some patches to make Unicode handling more efficient. However I'd also like better regex handling in upstream Hypothesis, and it should only get easier over time!

mristin commented 1 year ago

@Zac-HD here is our use case for which this feature is relevant.

We automatically generate a schema based on the specs for some data exchange format. Multiple patterns thusdo occur as there is quite a bit of multiple inheritance in the specs. This feature is also relevant for Hypothesis, where chained filters are silently rewritten in more optimal strategies.

Zac-HD commented 1 year ago

I'm not planning to work on this or the related Hypothesis issue any time soon, sorry, so if this is a business need you might need to work on it yourselves or pay a contractor (I have a shortlist). The first step would be getting support for Python (and ideally JS) regex syntax shipped in greenery; after that I'd expect it to be fairly straightforward.

mristin commented 1 year ago

@Zac-HD thanks for the timeline info! I suppose we'll try to play with greenery ourselves.