json-schema-org / json-schema-spec

The JSON Schema specification
http://json-schema.org/
Other
3.43k stars 251 forks source link

$schema introduces specification-level circular dependency #1521

Open mathematikoi opened 2 weeks ago

mathematikoi commented 2 weeks ago

The value of this keyword MUST be a URI [RFC3986] (containing a scheme) and this URI MUST be normalized. The current schema MUST be valid against the meta-schema identified by this URI.

case: a schema where $id equals $schema

the fact that a schema MUST be valid according to its meta-schema, goes on forever! especially if a "validation session" considers a schema and its meta-schema with no further context.

should this be a legitimate concern for implementations looking to be FULLY compliant with the spec?

gregsdennis commented 2 weeks ago

This isn't a concern.

Meta-schema validation isn't required when processing a schema, so most of the time it's skipped. This also means that when validating a schema against its meta-schema, you're not also validating the meta-schema against itself.

mathematikoi commented 2 weeks ago

i see, so an implementation should just assume that the meta-schema is a valid json schema?

gregsdennis commented 2 weeks ago

Generally, yes.

When validating a schema, the schema takes the role of the instance, and the meta-schema takes the role of the schema. But that doesn't mean that the implementation has to then validate the meta-schema's meta-schema. Typcially it won't.

mathematikoi commented 2 weeks ago

I totally get that part, but that priced in, how can you validate a schema that is its own meta-schema ($schema === $id), e.g. https://json-schema.org/draft/2020-12/schema

it does not make sense!

gregsdennis commented 2 weeks ago

Okay, so let's take an example. Suppose you (the user) have this instance:

{
  "foo": "bar"
}

And you want to validate it against the schema:

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "$id": "https://example.json-schema.org/foobar-object",
  "type": "object",
  "properties": {
    "foo": { "const": "bar" }
  }
}

To validate that instance using that schema, I (the implementation) check the $schema keyword in the schema to determine how to interpret the schema. It says "draft 7", and I already know Draft 7, so I evaluate the schema as Draft 7, processing only the keywords that Draft 7 of the spec defines.

IMPORTANTLY, I don't need to actually validate that the schema is a Draft 7 schema. I just interpret it that way. An error may occur if a keyword has an incompatible value.

If you tell me that I definitely should validate the schema, then I temporarily change modes and validate your schema as an instance against the meta-schema as a schema.

When I do this, I see that the draft 7 meta-schema (http://json-schema.org/draft-07/schema#) also has a $schema with http://json-schema.org/draft-07/schema#. IMPORTANTLY, I don't care. Just as before, I already know what Draft 7 is, so I evaluate the meta-schema (the schema here) as Draft 7, processing only the keywords that Draft 7 of the spec defines.

If that passes, then I continue as before, evaluating your original instance against your original schema.

The meta-schema evaluation never goes into a loop because being configured to validate a schema against its meta-schema doesn't mean that I also need to validate the meta-schema against its meta-schema.


Now, let's assume that I don't know what Draft 7 is, which means I don't recognize http://json-schema.org/draft-07/schema# as a valid meta-schema. At this point I just error and refuse to process the schema.

mathematikoi commented 2 weeks ago

Now, let's assume that I don't know what Draft 7 is, which means I don't recognize http://json-schema.org/draft-07/schema# as a valid meta-schema. At this point I just error and refuse to process the schema.

this use-case! the problem i insist on here: the specification doesn't require you "to know" the SCHEMA's META-SCHEMA, i can, on the fly, grab the META-SCHEMA uri, fetch it, and interpret it dynamically (i.e. interpreting $vocabulary etc), so i can proceed with interpreting the keywords in the SCHEMA.

my worry, is that there is an implicit requirement for the meta-schema to be a VALID json schema, and this requirement cannot be verified, and can only based on trusting the meta-schema author!

gregsdennis commented 2 weeks ago

fetch it

No, the spec explicitly warns implementations not fetch any reference. However, many implementations still do, and ideally such functionality is disabled by default.

Even if you were to use a custom meta-schema, eventually that meta-schema's $schema chain MUST point back to one of the known specification meta-schemas. If it doesn't then the implementation can't know which ruleset to use to process the schema.

For example, if you have

{
  "$schema": "https://some-cust.om/meta-schema",
  "$id": "https://example.json-schema.org/foobar-object",
  "type": "object",
  "properties": {
    "foo": { "const": "bar" }
  }
}

and

{
  "$schema": "https://some-cust.om/meta-schema",
  "$id": "https://some-cust.om/meta-schema",
  // ...
}

The implementation should error saying it can't process the schema. Ideally it would catch the circular meta-schema reference (implementations MUST catch circular references through $ref), but short of that, you'd get the stack overflow you're talking about.

What you need to do in your custom meta-schema is eventually point back to one of the known meta-schemas:

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "$id": "https://some-cust.om/meta-schema",
  // ...
}

This way, the implementation knows how to ultimately process your schema.

mathematikoi commented 2 weeks ago

i understand that uri doesn't need to be fetchable, but you gotta get the referenced schemas from somewhere (let's say a cache that maps URIs to schemas :) )

What you need to do in your custom meta-schema is eventually point back to one of the known meta-schemas.

that would be how i'd go about it currently too, i.e. marking [TRUSTINGLY] known schemas as inherently valid, so that we can halt the validation.

but this trused valid schemas concept is very implicit, adding to that, in the broader applicability of the specification, a custom meta-schema DOESN'T NECESSARILY point back to one of the known meta-schemas, and there is no concept of a "known meta-schema". the closest concept to this is the core vocabulary, anything else can be different.

my worry, is that there is an implicit requirement for the meta-schema to be a VALID json schema, and this requirement cannot be verified, and can only based on trusting the meta-schema author!

gregsdennis commented 2 weeks ago

a custom meta-schema DOESN'T NECESSARILY point back to one of the known meta-schemas

Walk me through how a custom meta-schema pointing to itself would work? How do I know how to process, for example, $ref? In draft 7 and earlier it causes sibling keywords to be ignored; in newer drafts, siblings are processed.

The meta-schemas must eventually point back to the version of JSON Schema you're using so that the tooling knows how to process the schema.

there is no concept of a "known meta-schema"

There is. Validation Section 5 identifies the top-level meta-schema. (I'd expect it to be in Core, but it doesn't appear to be.)

This meta-schema identifies that draft 2020-12 rules are to be used, especially for tools that don't support $vocabulary.

If $vocabulary is supported, then the argument could be made that it's the vocabs which determine keyword behavior, and then a meta-schema could theoretically be its own meta-schema. However, even in this case, circular references need to be managed. A tool would need to be able to recognize that a meta-schema is its own meta-schema and handle that scenario. (At that point, it still has to "just know" whether it supports that dialect.)

I don't personally recommend this, though, since vocabs aren't universally supported (their support is quite rare actually). In fact, we're extracting vocabularies as a concept to a proposal because they're not fully defined.

mathematikoi commented 2 weeks ago

Walk me through how a custom meta-schema pointing to itself would work? How do I know how to process, for example, $ref? In draft 7 and earlier it causes sibling keywords to be ignored; in newer drafts, siblings are processed.

{
    "$schema": "https://custom.tld/MY_CUSTOM_SCHEMA",
    "$id": "https://custom.tld/MY_CUSTOM_SCHEMA",
    "$vocabulary": {
        "https://json-schema.org/draft/2020-12/vocab/core": true,
        "https://json-schema.org/draft/2020-12/vocab/applicator": true,
        "https://json-schema.org/draft/2020-12/vocab/unevaluated": true,
        "https://custom.tld/MY_CUSTOM_VALIDATION_WITH_EXTRA_STUFF": true,
        "https://json-schema.org/draft/2020-12/vocab/meta-data": true,
        "https://json-schema.org/draft/2020-12/vocab/format-annotation": true,
        "https://json-schema.org/draft/2020-12/vocab/content": true
    },
    "$dynamicAnchor": "meta",

    "title": "Core and Validation specifications meta-schema",
    "allOf": [
        {"$ref": "meta/core"},
        {"$ref": "meta/applicator"},
        {"$ref": "meta/unevaluated"},
        {"$ref": "meta/validation"},
        {"$ref": "meta/meta-data"},
        {"$ref": "meta/format-annotation"},
        {"$ref": "meta/content"}
    ],
    "type": ["object", "boolean"],
    "$comment": "This meta-schema also defines keywords that have appeared in previous drafts in order to prevent incompatible extensions as they remain in common use.",
    "properties": {
        "definitions": {
            "$comment": "\"definitions\" has been replaced by \"$defs\".",
            "type": "object",
            "additionalProperties": { "$dynamicRef": "#meta" },
            "deprecated": true,
            "default": {}
        },
        "dependencies": {
            "$comment": "\"dependencies\" has been split and replaced by \"dependentSchemas\" and \"dependentRequired\" in order to serve their differing semantics.",
            "type": "object",
            "additionalProperties": {
                "anyOf": [
                    { "$dynamicRef": "#meta" },
                    { "$ref": "meta/validation#/$defs/stringArray" }
                ]
            },
            "deprecated": true,
            "default": {}
        },
        "$recursiveAnchor": {
            "$comment": "\"$recursiveAnchor\" has been replaced by \"$dynamicAnchor\".",
            "$ref": "meta/core#/$defs/anchorString",
            "deprecated": true
        },
        "$recursiveRef": {
            "$comment": "\"$recursiveRef\" has been replaced by \"$dynamicRef\".",
            "$ref": "meta/core#/$defs/uriReferenceString",
            "deprecated": true
        }
    }
}

i can just reuse the core vocabulary with the same $ref keyword specification (ignoring/processing sibling keywords).

There is. Validation Section 5 identifies the top-level meta-schema. (I'd expect it to be in Core, but it doesn't appear to be.)

that is just a provided schema for convenience, as the specification states:

The current URI for the default JSON Schema dialect meta-schema is https://json-schema.org/draft/2020-12/schema. For schema author convenience, this meta-schema describes a dialect consisting of all vocabularies defined in this specification and the JSON Schema Core specification, as well as two former keywords which are reserved for a transitional period.

If $vocabulary is supported, then the argument could be made that it's the vocabs which determine keyword behavior, and then a meta-schema could theoretically be its own meta-schema. However, even in this case, circular references need to be managed. A tool would need to be able to recognize that a meta-schema is its own meta-schema and handle that scenario. (At that point, it still has to "just know" whether it supports that dialect.)

this is good, but there is no such thing as "supporting a dialect", please bear with me, a "dialect" is just another json schema, and the specification applies to it just like any other schema.

I don't personally recommend this, though, since vocabs aren't universally supported (their support is quite rare actually). In fact, we're extracting vocabularies as a concept to a proposal because they're not fully defined. in our implementaton, we started with $vocabulary support, it does hold very well. actually the reason i thought we should go with our own implementation is that in others, full compliance with the spec does not even seem to be a goal.