python-jsonschema / jsonschema

An implementation of the JSON Schema specification for Python
https://python-jsonschema.readthedocs.io
MIT License
4.6k stars 578 forks source link

Improve `best_match` when used with applicators #1250

Closed Julian closed 5 months ago

Julian commented 5 months ago

Applicators have poor best matched errors in some cases even when there is only a single schema within the applicator!

A reproducer is:

from jsonschema import Draft202012Validator as Validator, exceptions

instance = [12, 12]

for applicator in "anyOf", "oneOf":
    # Should match {"items": {"const": 37}}
    single = {applicator: [{"items": {"const": 37}}]}
    error = exceptions.best_match(Validator(single).iter_errors(instance))
    print(single, "\n\n", error)

    # Should match {"items": {"const": 37}} due to type matching
    multiple = {applicator: [{"type": "object"}, {"items": {"const": 37}}]}
    error = exceptions.best_match(Validator(multiple).iter_errors(instance))
    print(multiple, "\n\n", error)

    # Should probably? match {"items": {"const": 37}} due to giving False low priority in any/oneOf
    multiple_false = {applicator: [False, {"items": {"const": 37}}]}
    error = exceptions.best_match(Validator(multiple_false).iter_errors(instance))
    print(multiple_false, "\n\n", error)

where there seem to be 3 related but separate issues.

1002 is likely related if not the same, but these examples are probably more minimized. We should recheck that issue when fixing.

ilia1243 commented 4 months ago

Hi, not sure if you expected one side effect. It is too minor for mentioning it in the separate issue, so leaving it here.

schema = {'oneOf': [
    {'properties': {'run': {'type': 'string'}}, 'required': ['run']},
    {'properties': {'uses': {'type': 'string'}}, 'required': ['uses']},
]}
instance = {'uses': 1, 'run': 1}

error = exceptions.best_match(Validator(schema).iter_errors(instance))
print(schema, "\n\n", error)

After the fix it produces:

 1 is not of type 'string'
On instance['run']:
    1

Conceptually, it is not clear why run has priority (I understand that technically it has priority due to the alphabetical order).

Based on real example https://raw.githubusercontent.com/SchemaStore/schemastore/master/src/schemas/json/github-workflow.json

As an alternative proposal, the relevance function could not be changed, but best_match could distinguish errors having different error.path in the same subscheme, and choose first of them (maybe having even not the minimal relevance, but the maximum one for this particular subscheme provided that it is still minimal among different subschemes).

Julian commented 4 months ago

That does seem undesired at first glance -- possibly/probably fixable as you say -- can you open a new issue so it's not forgotten?