python-jsonschema / jsonschema

An implementation of the JSON Schema specification for Python
https://python-jsonschema.readthedocs.io
MIT License
4.61k stars 581 forks source link

unevaluatedProperties error gets raised before more relevant validation errors #1132

Open Vidminas opened 1 year ago

Vidminas commented 1 year ago

Hello!

I noticed that in schemas using 'allOf' and unevaluatedProperties, when there is an error in a subschema, instead of showing the relevant error, unevaluatedProperties is (sometimes?) shown instead. This makes it really hard to see what went wrong in anything larger than trivial examples. This seems to be related to https://github.com/python-jsonschema/jsonschema/issues/1074, just with a different scenario.

Here is a small example:

import jsonschema
obj = {
    "fridge_min_temp": 10
}
schema = {
  "type": "object",
  "allOf": [
    {
      "properties": {
        "fridge_min_temp": { "type": "integer", "minimum": -20, "maximum": 5 }
      },
      "required": ["fridge_min_temp"]
    }
  ],
  "unevaluatedProperties": False
}
jsonschema.validate(obj, schema)

Which produces the error trace:

ValidationError: Unevaluated properties are not allowed ('fridge_min_temp' was unexpected)

Failed validating 'unevaluatedProperties' in schema:
    {'allOf': [{'properties': {'fridge_min_temp': {'maximum': 5,
                                                   'minimum': -20,
...
     'type': 'object',
     'unevaluatedProperties': False}

On instance:
    {'fridge_min_temp': 10}

If I validate instead using:

jsonschema.Draft202012Validator(schema).validate(obj)

Then the raised error is much more helpful:

ValidationError: 10 is greater than the maximum of 5

Failed validating 'maximum' in schema['allOf'][0]['properties']['fridge_min_temp']:
    {'maximum': 5, 'minimum': -20, 'type': 'integer'}

On instance['fridge_min_temp']:
    10

Seems like this happens because jsonschema.validate is equivalent to raise best_match(jsonschema.Draft202012Validator(schema).iter_errors(obj)) in this case, and the best_match heuristic downgrades any errors coming from 'allOf' and 'oneOf'. With a more complex schema, I get an unevaluatedProperties error using both approaches.

3 ways of going about it could be: 1) The suggestion from https://github.com/python-jsonschema/jsonschema/issues/1074 to separately keep track of unevaluated properties and invalid properties, and write different error messages for them.

2) Add properties to evaluated keys even if there are errors resulting from validator.descend in https://github.com/python-jsonschema/jsonschema/blob/main/jsonschema/_utils.py:

def find_evaluated_property_keys_by_schema(validator, instance, schema):
# ...
  for keyword in ["allOf", "oneOf", "anyOf"]:
    if keyword in schema:
      for subschema in schema[keyword]:
        errs = next(validator.descend(instance, subschema), None)
        if errs is None:
          evaluated_keys += find_evaluated_property_keys_by_schema(
              validator, instance, subschema,
          )

This way unevaluatedProperties will not get raised for properties with subschema errors at all (which shouldn't be a problem, because it'll get raised when those errors are fixed).

3) Make unevaluatedProperties messages even weaker than "allOf" and "anyOf" related errors in https://github.com/python-jsonschema/jsonschema/blob/main/jsonschema/exceptions.py#L19

I'd be happy to attempt a PR; do you have any thoughts about which approach is best?

Julian commented 1 year ago

The first thing to do here is to have a look at #1026 / #1002 / #812 / #698, which are the current open issues relating to improvements to the best_match heuristic, and see whether implementing any of them would improve your error message.

Have a look and lemme know if any seem relevant.