python-jsonschema / jsonschema

An implementation of the JSON Schema specification for Python
https://python-jsonschema.readthedocs.io
MIT License
4.63k stars 581 forks source link

Very slow validation after dynamicRef implementation, even with schemas which do not use dynamicRef other than in their metaschema #941

Closed mriedem closed 1 year ago

mriedem commented 2 years ago

We just started noticing that some tooling which is using this code hangs in 4.5.0:

import jsonschema
import ruamel.yaml

YAML = ruamel.yaml.YAML()
...
        with open(fname, "r") as f:
            self.config = YAML.load(f)
        jsonschema.validate(
            self.config, SCHEMA,
            format_checker=jsonschema.draft7_format_checker)

That seems to hang and there are no warnings or errors. When we drop back to jsonschema<4.5.0 (so 4.4.0) it works. I'm not sure what might be going on here or how to debug.

mriedem commented 2 years ago

These are the packages we have installed FWIW:

attrs==21.4.0,awesome-progress-bar==1.7.2,certifi==2021.10.8,cffi==1.15.0,charset-normalizer==2.0.12,Deprecated==1.2.13,ibmq-deploy==1.14.2,idna==3.3,importlib-resources==5.7.1,jsonschema==4.5.0,pycparser==2.21,PyGithub==1.55,PyJWT==2.3.0,PyNaCl==1.5.0,pyrsistent==0.18.1,PyYAML==6.0,requests==2.27.1,ruamel.yaml==0.17.21,ruamel.yaml.clib==0.2.6,urllib3==1.26.9,wrapt==1.14.1,zipp==3.8.0
mriedem commented 2 years ago

Running our unit tests also hangs. I killed the test runner and got this output, it looks like there is maybe a cycle in here?

========================================================================================== ERRORS ===========================================================================================
______________________________________________________________________ ERROR at setup of test_should_build_pr_no_push _______________________________________________________________________

module = <module 'test_build' from '/home/osboxes/ibmq/deploy-tool/tests/test_build.py'>

    def setup_module(module):
>       build.CONF.load()

tests/test_build.py:14: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
ibmq_deploy/config.py:28: in load
    jsonschema.validate(
.tox/py38/lib/python3.8/site-packages/jsonschema/validators.py:1036: in validate
    cls.check_schema(schema)
.tox/py38/lib/python3.8/site-packages/jsonschema/validators.py:201: in check_schema
    for error in cls(cls.META_SCHEMA).iter_errors(schema):
.tox/py38/lib/python3.8/site-packages/jsonschema/validators.py:241: in iter_errors
    for error in errors:
.tox/py38/lib/python3.8/site-packages/jsonschema/_validators.py:370: in allOf
    yield from validator.descend(instance, subschema, schema_path=index)
.tox/py38/lib/python3.8/site-packages/jsonschema/validators.py:257: in descend
    for error in self.evolve(schema=schema).iter_errors(instance):
.tox/py38/lib/python3.8/site-packages/jsonschema/validators.py:241: in iter_errors
    for error in errors:
.tox/py38/lib/python3.8/site-packages/jsonschema/_validators.py:299: in ref
    yield from validator.descend(instance, resolved)
.tox/py38/lib/python3.8/site-packages/jsonschema/validators.py:257: in descend
    for error in self.evolve(schema=schema).iter_errors(instance):
.tox/py38/lib/python3.8/site-packages/jsonschema/validators.py:241: in iter_errors
    for error in errors:
.tox/py38/lib/python3.8/site-packages/jsonschema/_validators.py:340: in properties
    yield from validator.descend(
.tox/py38/lib/python3.8/site-packages/jsonschema/validators.py:257: in descend
    for error in self.evolve(schema=schema).iter_errors(instance):
.tox/py38/lib/python3.8/site-packages/jsonschema/validators.py:241: in iter_errors
    for error in errors:
.tox/py38/lib/python3.8/site-packages/jsonschema/_validators.py:47: in additionalProperties
    yield from validator.descend(instance[extra], aP, path=extra)
.tox/py38/lib/python3.8/site-packages/jsonschema/validators.py:257: in descend
    for error in self.evolve(schema=schema).iter_errors(instance):
.tox/py38/lib/python3.8/site-packages/jsonschema/validators.py:241: in iter_errors
    for error in errors:
.tox/py38/lib/python3.8/site-packages/jsonschema/_validators.py:320: in dynamicRef
    yield from validator.descend(instance, subschema)
.tox/py38/lib/python3.8/site-packages/jsonschema/validators.py:257: in descend
    for error in self.evolve(schema=schema).iter_errors(instance):
.tox/py38/lib/python3.8/site-packages/jsonschema/validators.py:241: in iter_errors
    for error in errors:
.tox/py38/lib/python3.8/site-packages/jsonschema/_validators.py:370: in allOf
    yield from validator.descend(instance, subschema, schema_path=index)
.tox/py38/lib/python3.8/site-packages/jsonschema/validators.py:257: in descend
    for error in self.evolve(schema=schema).iter_errors(instance):
.tox/py38/lib/python3.8/site-packages/jsonschema/validators.py:241: in iter_errors
    for error in errors:
.tox/py38/lib/python3.8/site-packages/jsonschema/_validators.py:299: in ref
    yield from validator.descend(instance, resolved)
.tox/py38/lib/python3.8/site-packages/jsonschema/validators.py:257: in descend
    for error in self.evolve(schema=schema).iter_errors(instance):
.tox/py38/lib/python3.8/site-packages/jsonschema/validators.py:241: in iter_errors
    for error in errors:
.tox/py38/lib/python3.8/site-packages/jsonschema/_validators.py:340: in properties
    yield from validator.descend(
.tox/py38/lib/python3.8/site-packages/jsonschema/validators.py:257: in descend
    for error in self.evolve(schema=schema).iter_errors(instance):
.tox/py38/lib/python3.8/site-packages/jsonschema/validators.py:241: in iter_errors
    for error in errors:
.tox/py38/lib/python3.8/site-packages/jsonschema/_validators.py:47: in additionalProperties
    yield from validator.descend(instance[extra], aP, path=extra)
.tox/py38/lib/python3.8/site-packages/jsonschema/validators.py:257: in descend
    for error in self.evolve(schema=schema).iter_errors(instance):
.tox/py38/lib/python3.8/site-packages/jsonschema/validators.py:241: in iter_errors
    for error in errors:
.tox/py38/lib/python3.8/site-packages/jsonschema/_validators.py:320: in dynamicRef
    yield from validator.descend(instance, subschema)
.tox/py38/lib/python3.8/site-packages/jsonschema/validators.py:257: in descend
    for error in self.evolve(schema=schema).iter_errors(instance):
.tox/py38/lib/python3.8/site-packages/jsonschema/validators.py:241: in iter_errors
    for error in errors:
.tox/py38/lib/python3.8/site-packages/jsonschema/_validators.py:370: in allOf
    yield from validator.descend(instance, subschema, schema_path=index)
.tox/py38/lib/python3.8/site-packages/jsonschema/validators.py:257: in descend
    for error in self.evolve(schema=schema).iter_errors(instance):
.tox/py38/lib/python3.8/site-packages/jsonschema/validators.py:241: in iter_errors
    for error in errors:
.tox/py38/lib/python3.8/site-packages/jsonschema/_validators.py:299: in ref
    yield from validator.descend(instance, resolved)
.tox/py38/lib/python3.8/site-packages/jsonschema/validators.py:257: in descend
    for error in self.evolve(schema=schema).iter_errors(instance):
.tox/py38/lib/python3.8/site-packages/jsonschema/validators.py:241: in iter_errors
    for error in errors:
.tox/py38/lib/python3.8/site-packages/jsonschema/_validators.py:340: in properties
    yield from validator.descend(
.tox/py38/lib/python3.8/site-packages/jsonschema/validators.py:257: in descend
    for error in self.evolve(schema=schema).iter_errors(instance):
.tox/py38/lib/python3.8/site-packages/jsonschema/validators.py:241: in iter_errors
    for error in errors:
.tox/py38/lib/python3.8/site-packages/jsonschema/_validators.py:320: in dynamicRef
    yield from validator.descend(instance, subschema)
.tox/py38/lib/python3.8/site-packages/jsonschema/validators.py:257: in descend
    for error in self.evolve(schema=schema).iter_errors(instance):
.tox/py38/lib/python3.8/site-packages/jsonschema/validators.py:241: in iter_errors
    for error in errors:
.tox/py38/lib/python3.8/site-packages/jsonschema/_validators.py:370: in allOf
    yield from validator.descend(instance, subschema, schema_path=index)
.tox/py38/lib/python3.8/site-packages/jsonschema/validators.py:257: in descend
    for error in self.evolve(schema=schema).iter_errors(instance):
.tox/py38/lib/python3.8/site-packages/jsonschema/validators.py:241: in iter_errors
    for error in errors:
.tox/py38/lib/python3.8/site-packages/jsonschema/_validators.py:299: in ref
    yield from validator.descend(instance, resolved)
.tox/py38/lib/python3.8/site-packages/jsonschema/validators.py:257: in descend
    for error in self.evolve(schema=schema).iter_errors(instance):
.tox/py38/lib/python3.8/site-packages/jsonschema/validators.py:241: in iter_errors
    for error in errors:
.tox/py38/lib/python3.8/site-packages/jsonschema/_validators.py:340: in properties
    yield from validator.descend(
.tox/py38/lib/python3.8/site-packages/jsonschema/validators.py:257: in descend
    for error in self.evolve(schema=schema).iter_errors(instance):
.tox/py38/lib/python3.8/site-packages/jsonschema/validators.py:241: in iter_errors
    for error in errors:
.tox/py38/lib/python3.8/site-packages/jsonschema/_validators.py:47: in additionalProperties
    yield from validator.descend(instance[extra], aP, path=extra)
.tox/py38/lib/python3.8/site-packages/jsonschema/validators.py:257: in descend
    for error in self.evolve(schema=schema).iter_errors(instance):
.tox/py38/lib/python3.8/site-packages/jsonschema/validators.py:241: in iter_errors
    for error in errors:
.tox/py38/lib/python3.8/site-packages/jsonschema/_validators.py:313: in dynamicRef
    extended_schema = dynamic_anchor_extender(
.tox/py38/lib/python3.8/site-packages/jsonschema/_utils.py:406: in dynamic_anchor_extender
    extender_schema = _find_dynamic_anchor_intermediate(
.tox/py38/lib/python3.8/site-packages/jsonschema/_utils.py:386: in _find_dynamic_anchor_intermediate
    for subschema in search_schema(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

schema = {'$comment': 'This meta-schema also defines keywords that have appeared in previous drafts in order to prevent incompa... '$id': 'https://json-schema.org/draft/2020-12/schema', '$schema': 'https://json-schema.org/draft/2020-12/schema', ...}
matcher = <function match_keyword.<locals>.matcher at 0x7f9d92b7e4c0>

    def search_schema(schema, matcher):
        """Breadth-first search routine."""
        values = deque([schema])
        while values:
            value = values.pop()
            if isinstance(value, list):
                values.extendleft(value)
                continue
            if not isinstance(value, dict):
                continue
            yield from matcher(value)
>           values.extendleft(value.values())
E           Failed: Timeout >120.0s

.tox/py38/lib/python3.8/site-packages/jsonschema/_utils.py:431: Failed
Julian commented 2 years ago

I'd need some sort of reproducer -- there are plenty of tests for validate, so it's certainly not failing in all cases. What schema and instance are you validating either in your tests or real code?

mriedem commented 2 years ago

This is the file that defines our schema:

"""The json schema for the config file."""

STRING = {"type": "string"}
BOOL = {"type": "boolean"}

DEPLOY_SCHEMA = {
    "type": "object",
    "properties": {
        "resource_group": STRING,
        "region": STRING,
        "cluster": STRING,
        "openshift": BOOL,
        "chart": STRING,
        "cloud_secret": STRING,
        "namespace": STRING,
        "tag_name": STRING,
        "image_tag": STRING,
        "items": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "jumpurl": STRING,
                    "name": STRING,
                    "namespace": STRING,
                    "value_file": STRING,
                    "secret_file": STRING,
                    "resource_group": STRING,
                    "region": STRING,
                    "cluster": STRING,
                    "openshift": BOOL,
                    "pr": BOOL
                },
                "required": ["name"]
            }
        }
    },
    "required": ["resource_group", "region", "cluster", "chart", "image_tag", "items"]
}

DEPLOYMENTS_SCHEMA = {
    "type": "object",
    "items": DEPLOY_SCHEMA
}

BRANCH_SCHEMA = {
    "type": "object",
    "properties": {
        "name": STRING,
        "cloud_secret": STRING,
        "ns": STRING
    },
    "required": ["name"]
}

IMAGE_SCHEMA = {
    "type": "object",
    "properties": {
        "registry": {
            "type": ["string", "array"],
            "items": STRING
        },
        "namespace": STRING,
        "name": STRING,
        "branches": {
            "type": "array",
            "items": BRANCH_SCHEMA
        }
    },
    "required": ["registry", "namespace", "name", "branches"]
}

HELM_SCHEMA = {
    "type": "object",
    "properties": {
        "repos": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "name": STRING,
                    "url": STRING
                },
                "required": ["name", "url"]
            }
        }
    },
    "required": ["repos"]
}

# Which tools are instalable
TOOLS_SCHEMA = {
    "type": "object",
    "properties": {
        "helm": {"type": ["number", "string"]},
        "helm_timeout": {"type": ["string"]}
    }
}

SCHEMA = {
    "type": "object",
    "properties": {
        "tools": TOOLS_SCHEMA,
        "deployments": DEPLOYMENTS_SCHEMA,
        "image": IMAGE_SCHEMA,
        "helm": HELM_SCHEMA
    },
}

This is the yaml file being validated:

tools:
  helm: 3

deployments:
  master:
    resource_group: Support Services
    region: us-south
    cluster: support-services
    chart: q-site
    image_field: qsite_image_tag
    namespace: q-site-staging
    items:
      - name: q-site-dev
        value_file: values/values-master.yaml
  production:
    resource_group: Support Services
    region: us-south
    cluster: support-services
    chart: q-site
    image_field: qsite_image_tag
    namespace: q-site-prod
    items:
      - name: q-site-prod
        value_file: values/values-prod.yaml
Julian commented 2 years ago

Thanks, it'd also be helpful if you minimized that to the smallest hanging example.

mriedem commented 2 years ago

it'd also be helpful if you minimized that to the smallest hanging example

I'm not sure what you mean. We had some code that hung validating that exact yaml file (which is parsed and read in using ruamel.yaml.YAML().load() as above) using the SCHEMA object above. I'm not sure what in there is causing the hang exactly outside of that traceback from the hung unit test.

Julian commented 2 years ago

What I mean is it's helpful to me or anyone who can spare time to debug if you provide a minimal working example of the issue rather than one with a lot of extra unnecessary complexity. E.g. the issue almost certainly persists if you remove say, the chart properly in DEPLOY_SCHEMA. I'm asking for the smallest possible example demonstrating the problem. If you don't have time to do so you can certainly leave it as is though and someone else may come along to help.

progval commented 2 years ago

I have this smaller example from https://github.com/matrix-org/synapse/issues/12649 :

import jsonschema

_OEMBED_PROVIDER_SCHEMA = {
    "type": "array",
    "items": {
        "type": "object",
        "properties": {
            "endpoints": {
                "type": "array",
                "items": {
                    "type": "object",
                    "properties": {
                        "url": {"type": "string"},
                    },
                },
            },
        },
    },
}

config = [
    {
        "endpoints": [
            {
                "url": "https://publish.twitter.com/oembed",
            }
        ],
    }
]

jsonschema.validate(config, _OEMBED_PROVIDER_SCHEMA)

It seems to be minimal

DMRobertson commented 2 years ago

Bisecting against that test example, it seems to be introduced in #886.

Julian commented 2 years ago

That's really the only change in the release, so it's definitely there, but will still require some minimizing to figure out what the issue is.

I'll see if I have a bit of time in a few hours if someone doesn't see the issue by then. (And thanks all for the info so far)

Julian commented 2 years ago

The example there seems to complete, it's just (very) slow. Specifically here it completes in ~11s. My guess is the original example does too, just even slower, and that the issue is again some missing caching unfortunately.

mmb-davidsmith commented 2 years ago

We're seeing this too with our schema validations having slowed down. Moving to 4.5.1 fixes it.

mriedem commented 2 years ago

Moving to 4.5.1 fixes it.

Same here, thanks @Julian.

Julian commented 1 year ago

Hello all!

This, along with many many other $ref-related issues, is now finally being handled in #1049 with the introduction of a new referencing library which is fully compliant and has APIs which I hope are a lot easier to understand and customize.

The next release of jsonschema (v4.18.0) will contain a merged version of that PR, and should be released shortly in beta, and followed quickly by a regular release, assuming no critical issues are reported.

It looks from my testing like indeed the examples from this thread work reasonably there -- i.e. aren't unusably slow! If you still care to, I'd love it if you tried out the beta once it is released, or certainly it'd be hugely helpful to immediately install the branch containing this work (https://github.com/python-jsonschema/jsonschema/tree/referencing) and confirm. You can in the interim find documentation for the change in a preview page here.

I'm going to close this given it indeed seems like it is addressed by #1049, but feel free to follow up with any comments. Sorry for the delay in getting to these, but hopefully this new release will bring lots of benefit!