marksparkza / jschon

An object-oriented JSON Schema implementation for Python.
https://jschon.readthedocs.io
MIT License
137 stars 13 forks source link

Making oneOf/anyOf schema evaluation easier by discriminator value #34

Open anaghas123 opened 2 years ago

anaghas123 commented 2 years ago

I am having a use case where I have a large number of schemas in a oneOf/anyOf schema. In such a case it would be easier if I could use something like openapi discriminator to hint which schema to choose. Is there anyway I can customise the validation so that whenever there is a discriminator, the normal oneOf/anyOf validation does not happen and the validation happens based on the discriminator mapping?

marksparkza commented 2 years ago

Here's one possible way to do this, example adapted from https://swagger.io/docs/specification/data-models/inheritance-and-polymorphism/

First we create a discriminator keyword implementation that during schema construction looks at the adjacent anyOf array and creates a mapping (self.any_of_targets) of discriminator values to target subschemas (you can also look at oneOf, I've not done it here). During evaluation, it then picks the appropriate target schema based on the discriminator value in the object being evaluated.

We also override the anyOf and oneOf keywords so that they do not do their usual evaluation if there is an adjacent discriminator keyword.

from typing import Mapping

from jschon import JSON, JSONPointer, JSONSchema, URI, create_catalog
from jschon.json import JSONCompatible
from jschon.jsonschema import Result
from jschon.vocabulary import Keyword
from jschon.vocabulary.applicator import AnyOfKeyword, OneOfKeyword

class DiscriminatorKeyword(Keyword):
    key = 'discriminator'
    depends_on = 'anyOf', 'oneOf',
    instance_types = 'object',

    def __init__(self, parentschema: JSONSchema, value: Mapping[str, JSONCompatible]):
        super().__init__(parentschema, value)
        self.any_of_targets = {
            JSONPointer.parse_uri_fragment(URI(subschema['$ref'].value).fragment)[-1]:
                (index, subschema)
            for index, subschema in enumerate(parentschema['anyOf'])
        }

    def evaluate(self, instance: JSON, result: Result) -> None:
        discriminator_property = self.json['propertyName'].value
        discriminator = instance[discriminator_property].value
        target_index, target_schema = self.any_of_targets[discriminator]
        with result(instance, str(target_index)) as subresult:
            if not target_schema.evaluate(instance, subresult).passed:
                result.fail(f'The instance is invalid against the {discriminator} subschema')

class DiscriminatedAnyOfKeyword(AnyOfKeyword):
    key = 'anyOf'

    def evaluate(self, instance: JSON, result: Result) -> None:
        if not self.parentschema.get('discriminator'):
            super().evaluate(instance, result)

class DiscriminatedOneOfKeyword(OneOfKeyword):
    key = 'oneOf'

    def evaluate(self, instance: JSON, result: Result) -> None:
        if not self.parentschema.get('discriminator'):
            super().evaluate(instance, result)

catalog = create_catalog('2020-12')

metaschema = catalog.get_schema(URI('https://json-schema.org/draft/2020-12/schema'))
metaschema.kwclasses['anyOf'] = DiscriminatedAnyOfKeyword
metaschema.kwclasses['oneOf'] = DiscriminatedOneOfKeyword
metaschema.kwclasses['discriminator'] = DiscriminatorKeyword

schema = JSONSchema({
    "$schema": "https://json-schema.org/draft/2020-12/schema",
    "$id": "https://example.com/schema",
    "anyOf": [
        {"$ref": "#/$defs/simpleObject"},
        {"$ref": "#/$defs/complexObject"}
    ],
    "discriminator": {
        "propertyName": "objectType"
    },
    "$defs": {
        "simpleObject": {
            "properties": {
                "objectType": {"const": "simpleObject"},
                "value": {"type": ["number", "string"]}
            }
        },
        "complexObject": {
            "properties": {
                "objectType": {"const": "complexObject"},
                "value": {"type": ["array", "object"]}
            }
        }
    }
})

valid_simple_object = JSON({
    "objectType": "simpleObject",
    "value": 10
})
invalid_simple_object = JSON({
    "objectType": "simpleObject",
    "value": ["nope"]
})
valid_complex_object = JSON({
    "objectType": "complexObject",
    "value": ["super", "complex"]
})
invalid_complex_object = JSON({
    "objectType": "complexObject",
    "value": None
})

print(schema.evaluate(valid_simple_object).output('basic'))
print(schema.evaluate(invalid_simple_object).output('basic'))
print(schema.evaluate(valid_complex_object).output('basic'))
print(schema.evaluate(invalid_complex_object).output('basic'))

I've not tried to handle the discriminator/mapping property but the above should provide a starting point.

Let me know how you get along with this and if you have any questions about the example code.

handrews commented 1 year ago

In theory (at least as of OAS 3.x) discriminator shouldn't ever change the validation output. It MAY short-circuit the need to cover all branches of an anyOf, as long as you are not also collecting annotations. Short-circuiting a oneOf could cause a false-pass of validation, because if one of the other branches also passes then the oneOf MUST fail. The short-circuit aspect for oneOf is more for things like code generation, where you can assume that any validation has already happened.

There is also the use of discriminator in the parent schema (scroll down to the first long example in the Discriminator Object section). TBH, that has always just made my head hurt and I've never figured out how it ought to work. The adjacent-to-*Of is much more straightforward.

mapping shouldn't cause too much trouble as it just changes the implicit link between the value and the schema identification to an explicit one that works like $ref.