python-jsonschema / jsonschema

An implementation of the JSON Schema specification for Python
https://python-jsonschema.readthedocs.io
MIT License
4.59k stars 578 forks source link

jsonschema resolving $ref values as JSON Pointers #758

Closed wheelerlaw closed 1 year ago

wheelerlaw commented 3 years ago

Given the following malformed reference:

"$ref": "common.json#properties/version"

jsonschema is resolving it as if it were a JSON Pointer. However, it is not, since according to section 3 of RFC 6901, JSON Pointers always begin with a leading /:

A JSON Pointer is a Unicode string (see [RFC4627], Section 3) containing a sequence of zero or more reference tokens, each prefixed by a '/' (%x2F) character.

However, it can't be resolved as a subschema identifier either because subschema identifiers use plain-named fragments and don't allow / in the fragment.

See section 8.2.3 of the draft 7 spec of JSON Schema:

Using JSON Pointer fragments requires knowledge of the structure of the schema. When writing schema documents with the intention to provide re-usable schemas, it may be preferable to use a plain name fragment that is not tied to any particular structural location. This allows a subschema to be relocated without requiring JSON Pointer references to be updated.

To specify such a subschema identifier, the "$id" keyword is set to a URI reference with a plain name fragment (not a JSON Pointer fragment). This value MUST begin with the number sign that specifies a fragment ("#"), then a letter ([A-Za-z]), followed by any number of letters, digits ([0-9]), hyphens ("-"), underscores ("_"), colons (":"), or periods (".").

As well as section 5:

Per the W3C's best practices for fragment identifiers [W3C.WD-fragid-best-practices-20121025], plain name fragment identifiers in "application/schema+json" are reserved for referencing locally named schemas. All fragment identifiers that do not match the JSON Pointer syntax MUST be interpreted as plain name fragment identifiers.

The syntactically correct form of the first example would be:

"$ref": "common.json#/properties/version"

(notice the / after the #)

or

"$ref": "common.json#properties"

where some subschema has "$id": "#properties".

If I understand the code correctly, some changes are needed in the ref function in _validators.py, so to parse the value and determine if it s a JSON Pointer, a subschema identifier, or a URI reference.

Section 8.2.4 of the JSON Schema spec has some good examples.

Julian commented 3 years ago

Hi, thanks.

I believe this likely will be a duplicate of #371 (i.e. for location-independent identifier support, which is a known open issue) -- though GitHub closed #371 when I moved the master branch, even though it wasn't finished, so happy to keep this open.

But yeah I suspect it's the same issue (and is covered perhaps by the same tests which we currently skip until that's fixed).

wheelerlaw commented 3 years ago

Ah, yes. It does look like it is a duplicate of that, sorry for not checking to see if it had already been brought up. I mainly just opened this since it's in relation to https://github.com/json-schema-org/JSON-Schema-Test-Suite/issues/449.

Julian commented 3 years ago

No worries! All good -- I do assume though the upstream test suite issue may be closeable, because yeah I think this behavior is likely already covered by the tests that are being skipped. We can have a look though.

wheelerlaw commented 3 years ago

I'm not sure that the tests exists, I did look through the existing tests quickly and I didn't find that negative test case of a $ref in the format of <uri>#incorrect/json/pointer. But if you know that one exists, I would be happy to close that issue.

YKdvd commented 3 years ago

I'm not quite sure, but I believe this (and the closed but not fixed) #371 covers the simple case of {"$ref": "#some_local_id"} using and $id in a $ref? That seems to be part of the draft-07 spec, and doesn't seem to work with a test I did with jsonschema 3.2.0 from PyPI or the master branch here? Is this an exception to the "Full support for Draft 7"?

Julian commented 3 years ago

371 was closed accidentally when renaming the master branch and GH doesn't allow reopening it. But yes location independent identifiers are a known bug (one of a few -- see the test suite for the small number of tests that are skipped).

Julian commented 1 year ago

Hello there!

This, along with many many other $ref-related issues, is now finally being handled in #1049 with the introduction of a new referencing library which is fully compliant and has APIs which I hope are a lot easier to understand and customize.

The next release of jsonschema (v4.18.0) will contain a merged version of that PR, and should be released shortly in beta, and followed quickly by a regular release, assuming no critical issues are reported.

The new APIs seem to behave better here, for something like your example:

from referencing import Registry, Resource
from referencing.jsonschema import DRAFT7
import jsonschema
common = DRAFT7.create_resource({"properties": {"version": {}}})
registry = Registry().with_resource("common.json", common)
jsonschema.validate(
    schema={"$ref": "common.json#properties/version"},
    instance=12,
    registry=registry,
)

it indeed will try to resolve that as a plain-name fragment, not a pointer. (As you say, technically this isn't valid there either as a schema, but it'd seem the metaschema does not enforce that invalidity).

If you still care to, I'd love it if you tried out the beta once it is released, or certainly it'd be hugely helpful to immediately install the branch containing this work (https://github.com/python-jsonschema/jsonschema/tree/referencing) and confirm. You can in the interim find documentation for the change in a preview page here.

I'm going to close this given it indeed seems like it is addressed by #1049, but feel free to follow up with any comments. Sorry for the delay in getting to these, but hopefully this new release will bring lots of benefit!

wheelerlaw commented 1 year ago

Unfortunately I have lost most of the context that I had when I originally opened this ticket and I no longer am in the role I was in, but good to know that this is fixed!