json-schema-org / JSON-Schema-Test-Suite

A language agnostic test suite for the JSON Schema specifications
MIT License
627 stars 208 forks source link

tests/draft2020-12/refRemote.json: "remote HTTP ref with different $id" #621

Closed watuwo closed 1 year ago

watuwo commented 2 years ago

"tests/draft2020-12/refRemote.json" contains this test case:

{
    "description": "remote HTTP ref with different $id",
    "schema": {"$ref": "http://localhost:1234/different-id-ref-string.json"},
    "tests": [
        {
            "description": "number is invalid",
            "data": 1,
            "valid": false
        },
        {
            "description": "string is valid",
            "data": "foo",
            "valid": true
        }
    ]
},

"remotes/different-id-ref-string.json" contains:

{
    "$id": "http://localhost:1234/real-id-ref-string.json",
    "$defs": {"bar": {"type": "string"}},
    "$ref": "#/$defs/bar"
}

This is how we would determine the "$id"-derived URI of the target schema (the one with "$id": "http://localhost:1234/real-id-ref-string.json"):

We resolve "http://localhost:1234/real-id-ref-string.json" against "http://localhost:1234/different-id-ref-string.json" (the base URI constructed according to the assumptions of the test suite described in "README.md"). This yields "http://localhost:1234/real-id-ref-string.json".

We should definitely be able to dereference "http://localhost:1234/real-id-ref-string.json". But why should we be able to dereference "http://localhost:1234/different-id-ref-string.json"? I can not find anything in the JSON schema specifications that requires (or encourages) the dereferencing of a reference to the (implementation specific) "initial base URI".

I do not believe that the test case describes a required JSON schema feature and I believe it should be removed or moved.

Julian commented 2 years ago

"http://localhost:1234/different-id-ref-string.json" is that schemas retrieval URI. The suite is asking implementers to configure their own implementations to make that "be the case", however it is done for each implementation.

So it's the test suite's API for indicating a retrieval URI (not an initial base URI) for a schema that's relevant.

That's discussed (in Draft 2020) here, and in particular:

A schema MAY (and likely will) have multiple URIs, but there is no way for a URI to identify more than one schema.

The relevant test here indeed is exercising such a case, of a schema with both a retrieval URI (indicated by the test suite's API for doing so, i.e. sticking a file somewhere in a "magic directory") and a base URI (indicated via the $id).

I do not believe that the test case describes a required JSON schema feature and I believe it should be removed or moved.

It'd be a bit nice if you refrained from comments like this (which come off a bit aggressive, though it could be just text as a poor medium) until after a discussion -- there certainly is possibility that the suite has issues, but a priori it has been written and reviewed by people who are JSON Schema experts, so to assert this before having a discussion about the test is a bit premature. (Again though, it could be me mistaking your tone, and "challenging" tests in the suite is very much welcome.)

handrews commented 2 years ago

To clarify the use of "initial base URI", which is not a formal term: RFC 3986 §5.1-5.4 define the sources and precedence of a base URI. $id sets the base UR in accordance with §5.1, which is the highest-precedence source. The retrieval URI is a possible base URI in accordance with §5.3. It is an "initial" base URI because you know the retrieval URI first, and then you find $id, which takes precedence and becomes the base URI for further processing. If the $id is relative, you would most likely use the retrieval URI as the base URI for that resolution (again, serving as the initial-as-in-before-you-resolve-$id base URI). I say "most likely" because it's possible that RFC 3986 §5.2 could be relevant as well.

watuwo commented 2 years ago

I am sorry for bringing up the term of "initial base URI" (I should not have done that; I see that this can make my post confusing). "http://localhost:1234/different-id-ref-string.json" might have or not have many different roles. My question is: why does "http://localhost:1234/different-id-ref-string.json" have the role of a "$ref"-able identifier of the schema with "$id": "http://localhost:1234/real-id-ref-string.json"? Why is this necessarily the case?

I am going to try another formulation: If an implementation failed the test case, I would not know how to use the JSON schema specification to justify a (imaginary) claim that the implementation does not conform to the JSON schema specification.

which come off a bit aggressive

This is definitely not my intention. My thinking was: "I will finish with a concrete initial claim and proposal so that they can be discussed and evolve". Obviously I have failed to communicate this.

Julian commented 2 years ago

This is definitely not my intention.

Cool, all good, ignore me then! Tone is hard to sense, somehow I took it differently than sounds like it was intended.

What you're asking doesn't have anything directly to do with this test case, does it? It sounds like you're asking which part of the specification requires implementations to support identifying schemas with arbitrary retrieval URIs, no?

Or am I misunderstanding the point? It's true the specific test you're referencing has a different retrieval URI than what's in its $id, but you're comfortable with other tests like this one which also asks users of the test suite to make that $ref-erenced schema available at that URI?

(Obviously all the above needs to be justified by the specification, as you say, just trying to understand which specifically you're asking about)

watuwo commented 2 years ago

It sounds like you're asking which part of the specification requires implementations to support identifying schemas with arbitrary retrieval URIs, no?

Exactly.

but you're comfortable with other tests like this one [...]?

You are right: The "remote ref" test case confuses me too. There is one difference: The "remote ref" test case could easily be "fixed" by adding "$id": "http://localhost:1234/draft2020-12/integer.json" to the intended target schema. I reported the "remote HTTP ref with different $id" test case as an example. I chose an example where the "fix" of adding an "$id" to the target schema is not possible (there is one more such case, see list below).

These might be all similar examples for the 2020-12 version (I have not manually checked all items):

Julian commented 2 years ago

Got it, ok. Then yes see the section I referenced! The paragraph before the one I quoted is also relevant:

Implementations SHOULD be able to associate arbitrary URIs with an arbitrary schema and/or automatically associate a schema's "$id"-given URI, depending on the trust that the validator has in the schema. Such URIs and schemas can be supplied to an implementation prior to processing instances, or may be noted within a schema document as it is processed, producing associations as shown in appendix A.

That's the behavior depended upon by associating these retrieval URIs with the schemas.

watuwo commented 1 year ago

If I understand you correctly, then the argumentation is:

It is great that there is a test suite. I would not have noticed this feature (in this generality) otherwise (even though I currently don't want to rely on it). If you like the association of arbitrary URIs with arbitrary schemas, one could commit to the feature even more by adding even "meaner" tests. One could for instance associate a schema with a URI that is not also a retrieval URI (e.g. "mailto:example@example.com") or with a URI with a JSON-pointer-like fragment (I don't know yet how far one can go with that). One could check that a schema can be referenced using the externally associated URI directly when the target's "$id" is a relative URI reference. Of course one can always add more tests (doesn't mean it's necessarily worth the effort).

Julian commented 1 year ago

One could for instance associate a schema with a URI that is not also a retrieval URI (e.g. "mailto:example@example.com")

That's correct, we indeed could and want to -- literally the only reason we haven't already is because most implementations read schemas by locating them on the filesystem within the remotes directory, and we have no way (in the test suite API) of signalling "the retrieval URI of this schema is meant to be urn:foo".

Bowtie however perfectly well supports doing precisely that! So we are very likely to have tests of this form now.

Julian commented 1 year ago

Going to close since yes! What you reiterated is the idea. But if there are still questions/suggestions, follow up!