python-jsonschema / jsonschema

An implementation of the JSON Schema specification for Python
https://python-jsonschema.readthedocs.io
MIT License
4.58k stars 578 forks source link

jsonschema.TypeChecker with additional types fails to work after several reference resolutions by the referencing.Registry #1197

Closed bvantuan closed 6 months ago

bvantuan commented 9 months ago

Here is the code to produce the error:

from referencing import Registry, Resource
from referencing.jsonschema import DRAFT7
from collections.abc import Mapping
import jsonschema
import collections

class Config(collections.ChainMap):
    def __init__(self, *maps):
        super().__init__(*maps)

    def __getitem__(self, key):
        return self._chained_getitem(key)

    def _chained_getitem(self, key, **kwargs):
        """
        Actual implementation of ``__getitem__`` with chained behavior.

        When returning a new instance of `DeepChainMap`, the ``kwargs`` are
        passed to the constructor.  Subclasses should override this method
        instead of ``__getitem__`` in order to pass additional ``__init__``
        args.
        """

        chained_values = []

        for m in self.maps:
            try:
                chained_values.append(m[key])
            except KeyError:
                continue

        if not chained_values:
            raise KeyError(key)

        first = chained_values[0]

        # Note: Although instances of dict are also instances of Mapping,
        # isinstance(x, dict) is much faster than isinstance(x, Mapping), and
        # dict being the most common case we explicitly check for it first.
        if not isinstance(first, (dict, Mapping)):
            return first

        nested = []

        for m in chained_values:
            if isinstance(m, (dict, Mapping)):
                nested.append(m)
            else:
                break

        return self.__class__(*nested, **kwargs)

config = Config({'a': {'b': 'c', 'd': {}}})

main_schema = {
                '$id': 'main_schema', 
                'type': 'object', 
                'properties': 
                     {'a': {'properties': {'b': {}, 
                                           'd': {}}, 
                                          '$ref': 'schema_1'}
                      }
                 }

schema_1 = {'type': 'object', 
            'properties': {'b': {'enum': ['c']}, 
                           'd': {'$ref': 'd_schema'}},
             'required': ['b']} 

schema_2 = {'$id': 'schema_2', 
            'fixed_inputs': {
                            'type': 'object', 
                            'default': {}, 
                            'properties': {'e': {'type': 'integer', 'minimum': 1}} 
                            }
            }

d_schema = {'$schema': 'http://json-schema.org/draft-07/schema#', 
            '$id': 'd_schema', 
            '$ref': 'schema_2#/fixed_inputs'}

def retrieve(uri: str):
    if uri == 'schema_1':
        contents = schema_1
    elif uri == 'schema_2':
        contents = schema_2
    elif uri == 'd_schema':
        contents = d_schema
    return Resource.from_contents(contents, DRAFT7)

registry = Registry(retrieve=retrieve)

def is_my_object(checker, instance):
    return (
        isinstance(instance, (dict, Mapping))
    )
type_checker = jsonschema.Draft7Validator.TYPE_CHECKER.redefine(
        "object", is_my_object,
    )

CustomValidator = jsonschema.validators.extend(
    jsonschema.Draft7Validator,
    type_checker=type_checker,
)

validator = CustomValidator(schema=main_schema, registry=registry)
validator.validate(config)           

The error indicates Config({}) is not of type 'object' for the third reference resolution (schema_2), while the first two successfully enter the is_my_object function but the third one does not.

Julian commented 9 months ago

I suspect this is likely a duplicate of #994, albeit your example is a much more reasonable one (in terms of it being behavior that seems like it clearly should work). If it is a dupe, there's still no fix quite yet for the behavior, it may need to wait for a refresh of the dialect behavior overall, but there was a workaround or two mentioned there in the thread (which boil down to "if you want all Draft 7 to behave this way, then declare that your extended validator is the one that validates Draft 7 and fully override the built in one").

Julian commented 6 months ago

I've moved this example there -- though I think supporting this fully will likely not come until some larger work reworking how dialects are "registered" (or here re-registered) in jsonschema happens. You can watch that issue for updates, or feedback / suggestions are of course welcome.