python-jsonschema / jsonschema

An implementation of the JSON Schema specification for Python
https://python-jsonschema.readthedocs.io
MIT License
4.52k stars 574 forks source link

Mitigate undesired side effect of new `best_match` behaviour with alternative proposal #1257

Open ilia1243 opened 2 months ago

ilia1243 commented 2 months ago

In continuation to #1250. The fix has seemingly undesired side effect.

from jsonschema import Draft202012Validator as Validator, exceptions

schema = {'oneOf': [
    {'properties': {'run': {'type': 'string'}}, 'required': ['run']},
    {'properties': {'uses': {'type': 'string'}}, 'required': ['uses']},
]}
instance = {'uses': 1, 'run': 1}

error = exceptions.best_match(Validator(schema).iter_errors(instance))
print(schema, "\n\n", error)

After the fix it produces:

 1 is not of type 'string'
On instance['run']:
    1

Conceptually, it is not clear why run has priority (I understand that technically it has priority due to the alphabetical order).

Based on real example https://raw.githubusercontent.com/SchemaStore/schemastore/master/src/schemas/json/github-workflow.json

As an alternative proposal, the relevance function could not be changed, but best_match could distinguish errors having different error.path in the same subscheme, and choose first of them (maybe having even not the minimal relevance, but the maximum one for this particular subscheme provided that it is still minimal among different subschemes).

ilia1243 commented 2 months ago

Opened pull request with possible improvement of best_match.

nfriedl1 commented 1 month ago

Hi, I think I am running into the same thing, but I wanted to share my example just in case.

from jsonschema import validate
from jsonschema.exceptions import ValidationError

schema = {
    "type": "object",
    "properties": {
        "tuple_or_tuple_array": {
            "anyOf": [
                {
                    "type": "array",
                    "items": {
                        "$ref": "#/$defs/int_tuple",
                    },
                },
                {
                    "$ref": "#/$defs/int_tuple",
                },
            ]
        },
    },
    "$defs": {
        "int_tuple": {
            "type": "array",
            "items": {
                "type": "integer",
            },
            "minItems": 2,
            "maxItems": 2,
        },
    },
}

def try_validate(instance):
    try:
        validate(instance, schema)
    except ValidationError as e:
        print(str(e))
    else:
        print("No error.\n")
    finally:
        print("-" * 80)

# Case 1
control = {"tuple_or_tuple_array": [0, 0]}
try_validate(control)

# Case 2
control2 = {"tuple_or_tuple_array": [[0, 0], [0, 0]]}
try_validate(control2)

# Case 3
misleading_error = {"tuple_or_tuple_array": [0, "not an int"]}
try_validate(misleading_error)

# Case 4
expected_error = {"tuple_or_tuple_array": ["not an int", 0]}
try_validate(expected_error)

Below is the output with jsonschema == 4.22.0.

No error.

--------------------------------------------------------------------------------
No error.

--------------------------------------------------------------------------------
0 is not of type 'array'

Failed validating 'type' in schema[0]['items']:
    {'items': {'type': 'integer'},
     'maxItems': 2,
     'minItems': 2,
     'type': 'array'}

On instance[0]:
    0
--------------------------------------------------------------------------------
['not an int', 0] is not valid under any of the given schemas

Failed validating 'anyOf' in schema['properties']['tuple_or_tuple_array']:
    {'anyOf': [{'items': {'$ref': '#/$defs/int_tuple'}, 'type': 'array'},
               {'$ref': '#/$defs/int_tuple'}]}

On instance['tuple_or_tuple_array']:
    ['not an int', 0]
--------------------------------------------------------------------------------

The 3rd case here has a misleading error message. As shown in case 4, if the values in the tuple are swapped, the error message is more accurate.

Julian commented 1 month ago

A helpful thing to check would be if your case is solved by @ilia1243's fix in #1258 (which I am still unfortunately behind on looking into). But it would be very helpful if you checked whether you are more satisfied with the error after that change.

ilia1243 commented 1 month ago

The mentioned fix does solve the problem by @nfriedl1. It shows in 3 and 4 cases:

[0, 'not an int'] is not valid under any of the given schemas

and

['not an int', 0] is not valid under any of the given schemas