python-jsonschema / check-jsonschema

A CLI and set of pre-commit hooks for jsonschema validation with built-in support for GitHub Workflows, Renovate, Azure Pipelines, and more!
https://check-jsonschema.readthedocs.io/en/stable
Other
191 stars 38 forks source link

Failure resolving ref to anchor within same schema after v0.24.0 #344

Closed alex-tdrn closed 4 months ago

alex-tdrn commented 8 months ago

Hello, I've run into a problem with references to anchors defined in the same schema file. I'm pretty sure that this usage of anchors is valid under the spec, but I'd appreciate anyone pointing out a spec violation in my usage. I've reduced the issue to the following case:

Given repro_schema.json:

{
    "$schema": "https://json-schema.org/draft/2020-12/schema",
    "my_subschema": {
        "$anchor": "my_subschema_anchor"
    },
    "properties": {
        "field": {
            "$ref": "#my_subschema_anchor"
        }
    },
    "type": "object"
}

And given repro_example.json file to be validated:

{
    "field": ""
}

When invoking check-jsonschema in the following way:

check-jsonschema --schemafile=repro_schema.json repro_example.json

I get a success ouptut from v0.23.3, but the following error message from v0.24.0 - v0.27.0:

Failure resolving $ref within schema

_WrappedReferencingError: NoSuchAnchor: 'my_subschema_anchor' does not exist within {'$schema': 'https://json-schema.org/draft/2020-12/schema', 'my_subschema': {'$anchor': 'my_subschema_anchor'}, 'properties': {'field': {'$ref': '#my_subschema_anchor'}}, 'type': 'object'}
  in "XXX\python\current\Lib\site-packages\check_jsonschema\checker.py", line 82
  >>> result = self._build_result()

  caused by

  NoSuchAnchor: 'my_subschema_anchor' does not exist within {'$schema': 'https://json-schema.org/draft/2020-12/schema', 'my_subschema': {'$anchor': 'my_subschema_anchor'}, 'properties': {'field': {'$ref': '#my_subschema_anchor'}}, 'type': 'object'}
    in "XXX\python\current\Lib\site-packages\jsonschema\validators.py", line 446
    >>> resolved = self._resolver.lookup(ref)
sirosen commented 8 months ago

I'm marking this as an accepted bug. I haven't used anchors myself, but your usage looks correct to me.

(I might have to take this upstream to the jsonschema project on which check-jsonschema is built.)

sirosen commented 4 months ago

I'm circling back on this item! (Sorry about the long delay; it fell off the end of my TODO list.)

I'm not confident that this is an upstream issue or bug -- it may actually be an incorrect behavior in the older versions which has since been fixed. I'm asking the JSON Schema folks about this so that I can better understand, and hopefully relay what I learn back to you.

Here's what I know so far, which led me to the idea that this might not be a bug:

 {
     "$schema": "https://json-schema.org/draft/2020-12/schema",
-     "my_subschema": {
-         "$anchor": "my_subschema_anchor"
-     },
+    "$defs": {
+        "my_subschema": {
+            "$anchor": "my_subschema_anchor"
+        }
+    },
     "properties": {
         "field": {
             "$ref": "#my_subschema_anchor"
         }
    } ,
     "type": "object"
 }

This may be self-consistent behavior. If $anchor is only defined within locations which MUST be valid schemas under the spec, then we would expect this.

The JSON Schema specification is, to my reading, at least slightly unclear about this. It talks about $anchor within schemas -- implicitly this means that the object containing an $anchor is a schema -- but it doesn't talk about how the containing document should be discovered to be a schema as far as I can tell.

So I'm a little fuzzy about this. As I mentioned, I've put it to the JSON Schema maintainers to answer which behavior is correct, and hopefully I'll have an answer soon.

sirosen commented 4 months ago

I'm removing the bug tag. This is a change in behavior across versions of the underlying jsonschema library, but it turns out that this behavior is... wait for it... undefined.

According to the spec, $anchor is valid in any schema. The trouble is, how can an implementation know which parts of a document are schemas and which parts are non-schema data? It's ambiguous. By way of example, {"example": {"$ref": "foo"}} should probably not resolve the ref inside of the example (maybe it's an example of how to write a $ref!).

Certain locations like $defs are known to contain schemas, so $anchor MUST work in those locations under all implementations. Not respecting $anchor in such a case is a spec violation. But other locations in a schema document may or may not be schemas. An implementation MAY treat them as such, or it may choose not to.

In my conversation with the JSON Schema folks, it seemed that there's some agreement with my take on this (informed by some other experiences with ambiguities in the spec), which is that ideally we would actually treat the old behavior you saw as a spec violation. That is, respecting an $anchor outside of a location which is defined to contain a schema would be incorrect. The trouble isn't so much agreeing that such a definition would be less confusing, but "how do we get there?" given that past spec versions did not stipulate this (if I followed the conversation there correctly).


So! It's confusing that this worked and then stopped working. But the new behavior is either more correct, if you view the spec the way I do, or it's at least no less correct than the old behavior. It's not an ideal way to close out an issue, but I think there's not much more to be done on this one. (And we can always reopen.)

The very last note I'll give is that if the old behavior is very important, it may be possible to design some feature which allows users to designate parts of their schema documents as subschemas, and then forcibly load those parts of the schema. It would need some design, but something like --force-subschema-loading '$.my_subschema' could do it. I'm not eager to build that out unless it's very important to some use-cases, however.

alex-tdrn commented 4 months ago

Thank you so much for this thorough investigation! It makes sense how this is undefined behaviour. I would not suggest adding switches to your tool for accepting off-spec schemas. We'll just have to fix our schema definitions at some point. For slightly more context, the resulting schema is parsed using a jinja parser that cannot handle the $ sign, that's why I avoided placing subschemas in $definitions.

Thank you again for going to all this trouble!

sirosen commented 4 months ago

Thanks for your kind words! It was no trouble at all -- and it helped me deepen my own understanding. :smile:

Please let me know if you run into other unexpected issues or bugs in your adventures with check-jsonschema. I'm always happy to help.