Closed gregsdennis closed 2 years ago
There's no real reason other than that it'd be noise (a property needed in all schemas) -- it's also valid to leave it out even though it's highly recommended, so even if we did add it everywhere you'd still have a few (one at least) that tests the behavior without it.
Your implementation doesn't allow explicitly choosing which version to apply?
E.g. in mine, when I run the suite, for each folder, I explicitly use the appropriate draft, even though mine works as you say defaulting to the latest if you don't specify.
@Julian given that we're strongly encouraging folks to start using $schema
and the $vocabulary
keyword in the referenced meta-schema to control how to process schemas, we should really include this.
The idea that $schema
is a needless formality has significantly degraded the usefulness of the keyword to the point that many implementations outright ignore it and just default to doing whatever they feel like regardless of what the schema declares. That's a problem.
Your implementation doesn't allow explicitly choosing which version to apply?
Quite the opposite. If you specify with $schema
, it applies at that draft. But if you don't, it makes a best guess given the keywords that are used.
The idea that
$schema
is a needless formality has significantly degraded the usefulness of the keyword to the point that many implementations outright ignore it and just default to doing whatever they feel like regardless of what the schema declares. That's a problem.
Exactly. Moreover, we should be the example of how schemas should be authored.
Quite the opposite. If you specify with $schema, it applies at that draft. But if you don't, it makes a best guess given the keywords that are used.
Right that's what I mean -- in mine, there are 3 choices -- if you specify with $schema
you get that one. If you specify nothing you get latest. But if you want to, you can explicitly choose which draft to use. Seems pretty important to me to support the latter if you're going to support multiple drafts, and that was definitely how the suite was written (that you explicitly apply the validator for each corresponding draft).
That being said though I'm definitely not against adding $schema
to all the schemas for what @handrews you said -- I do think we should have at least 1 test for each draft that doesn't use it and asserts you still get the right draft-specific behavior so long as the drafts don't say it is literally required, otherwise there's a valid schema we won't cover.
(My usual neuroses about not using large sweeping automated formatting changes there though still may mean we have to do this slowly enough to review the patches individually)
Exactly. Moreover, we should be the example of how schemas should be authored.
I don't think the test suite should be a guide there -- it contains in many cases intentionally pathological examples, but yeah fair enough I guess I shouldn't push back if I'm not disagreeing :D
But if you want to, you can explicitly choose which draft to use. Seems pretty important to me to support the latter if you're going to support multiple drafts
I'd say it's far more important to support reading and respecting $schema
and $vocabulary
. Sure, its fine to put in an option to explicitly set it when $schema
is absent, but I would find the ability to outright contradict it to be of dubious value.
Exactly. Moreover, we should be the example of how schemas should be authored.
Yup. This is particularly critical with $vocabulary
now. In the past, there wasn't really all that much you could do with $schema
. For most implementations, it understood only the http://json-schema.org/{version}/schema
URI for each version anyway. Custom meta-schema URIs were essentially useless. With $vocabulary
that is no longer the case.
I do think we should have at least 1 test for each draft that doesn't use it and asserts you still get the right draft-specific behavior so long as the drafts don't say it is literally required, otherwise there's a valid schema we won't cover.
Sure, we can have a test for this, but we only need one, and it should only test behavior specified by the specification. So I'm not sure what it should tests. The most recent specification does state a default meta-schema, but I believe it is the first to explicitly do so.
My usual neuroses about not using large sweeping automated formatting changes there though still may mean we have to do this slowly enough to review the patches individually
@Julian are you really saying that you're not OK with mechanically inserting the same $schema
everywhere in each version tree? It's so easy to just do that and review them. If there's a problem with file formatting not being stable enough, then we should absolutely fix that first.
but I would find the ability to outright contradict it to be of dubious value.
I didn't mention contradicting -- though yes, mine supports that too. Just specifying it external to a schema. Can't tell if you two disagree, but yeah that's been pretty fundamental to me, for literally this reason, that from what I can tell, very few people use $schema
, but still write their schema to a specific draft.
Fixing that human problem is one thing. Not allowing actually getting correct behavior though seems like another, hence me saying I think it's quite important for an implementation to support that -- but whatever, if we do disagree probably on this ticket isn't the greatest place to discuss that (and I'm not sure what ramifications that disagreement has, if your library's users aren't asking you for it who cares what I think :).
@Julian are you really saying that you're not OK with mechanically inserting the same $schema everywhere in each version tree? It's so easy to just do that and review them. If there's a problem with file formatting not being stable enough, then we should absolutely fix that first.
It's not a file formatting issue -- it's an automated change issue -- in my world, there are only 2 kinds of changes:
A large automated change adding $schema
to all schemas is essentially impossible to code review (for me or any human being), and also is likely to introduce unintended bugs due to the specific JSON parser's peculiarities -- so so far yeah I have pushed back on any large automated change.
A large automated change adding $schema to all schemas is essentially impossible to code review
Well I strongly disagree with that but this is your package so I won't push. I'm not submitting manual PRs to individually change it though, and I find the amount that that's going to slow things down frustrating.
Well -- how about this -- if you and @gregsdennis feel confident one can submit it and the other review it that I don't object to -- I do think it is impossible even for you :), and like I said I think there's some research, logic (that your eyes glaze over from reading long mechanical changes and you literally do not notice things) and anecdotal evidence (I have literally never seen such a change not introduce a bug, including the last time in this repo, when I happened to trust my gut and force myself to try to review the change and then sure enough find a bug)
I do understand the sentiment, this isn't the first time I've had this discussion (though I did not invent the above :) -- I got it from other projects that value high quality software and programmers smarter than me, and so far it's served me well as a programmer) -- but yeah if the two of you are confident in the change, it's not like I've got an iron fist here, this repo is useful to all hopefully, and you both are pretty reasonable :) (though yeah I don't want to see this kind of change on any at all frequent basis)
@Julian I apologize for pushing on the bulk change topic. Please feel free to edit out any comments, in whole or in part, that you feel detract from the point of this issue. 🙏
There should at least be some tests in each version's suite that contain a $schema
value for a different draft, and some assertions that would work only when the version is properly detected.
For example, one could copy a test from draft4 which uses a keyword whose semantics later changed (say exclusiveMinimum, that used to specify a boolean value but is now numeric), adding a "$schema": "http://json-schema.org/draft-04/schema#"
to it -- indicating that the parser should recognize that this is not a draft7 schema and should not be evaluated as such (it could either error out, or downgrade to draft4 semantics).
Another tricky test would be a schema with both a $id
and id
keywords, but with different values -- the test being that the proper one is used and the other is ignored, according to the value of $schema
.
That part is/was json-schema-org/json-schema-spec#210 but I don't remember what happened there.
But agreed though on we should have them (though I think it'd be nice if we actually did it in a draft-agnostic way -- i.e. ideally we shouldn't rely on downstream validators supporting multiple drafts -- we should instead do this by e.g. fixing a single draft, then redefining a new metaschema, then using the new metaschema in a new schema or something, and then having the test verify that it's validated using the new metaschema. This way someone can have them pass without implementing multiple drafts).
Can we separate out that one from this ticket which I'm inclined to even close since we got all wrapped up in unrelated things?
This one was originally about test case style, but I'm inclined to be bold given all the back and forth we got tangled in and say we should just come back to that later, let's get 2019-09 tests all done first...
From a ticket closed as a dupe, sharing this again below:
According to the specification, JSON Schema definitions are recommended to have a $schema keyword.
To me, I don't feel the recommendation really applies strongly here, for reasons I intended to describe in json-schema-org/json-schema-spec#324 -- namely, what I think we should optimize for in the test suite is readability, not perfection -- and while that recommendation is perfect for schemas in isolation which otherwise wouldn't know what $schema
they belong to, for the test suite, it seems more to be boilerplate that makes it harder to focus on the key point of each test case. So I've always been -0 on adding it.
On the other hand I do like:
Instead of relying on implementors to guess which one should be used for every folder, would it make sense to manually set $schema in all the tests
since it seems nice to be able to rely on something someone already needs to support, but I don't think, again in my opinion, that this will work really. Because the test suite will always need to also include a test where $schema
is not present (as we always do for SHOULD
s in the spec which may be ignored), and once we need to do so, the problem comes back again.
But yeah happy to either elaborate myself, or hear again from others. I know Greg and Henry may still be in favor of this.
I don't think there's been a specific reason we don't include $schema
except that it's not required, and adding "$schema" would mean that this is no longer tested.
Instead of relying on implementors to guess which one should be used for every folder
Guessing what, exactly? JSON Schema should be suitably well defined to operate without a meta-schema. What's written in the spec is authoritative, what appears in the meta-schema is merely informative.
@Julian, the role of $schema
has changed over time. Originally, it didn't exist at all. In drafts 3 and 4, it was presented as an inessential nice-to-have, which made your approach reasonable. Drafts 6 and 7 didn't really change the level of requirement around $schema
, so that continued to be fine.
However, as of 2019-09, we made reliable extensibility a major goal and developed $vocabulary
, which works via $schema
, to accomplish that goal. As part of this, the absence of $schema
in the document root was explicitly stated to result in undefined behavior.
Basic QA principles say that you can't write test cases that combine implementation-defined behavior with positive functionality tests (for those who don't know, my career bounced back and forth between systems QA and Development before settling into Technical Director roles spanning both departments). Doing so means that every test outcome is impacted by implementation-specific choices rather than standards conformance.
So on those grounds, if someone came to me with a test plan where nearly every test case involved an implementation-defined condition, I would reject the plan.
The larger topic here is whether the JSON Schema org supports the goal of reliable extensibility. If we do, then we really need to stop treating $schema
as an afterthought. Implementation behavior around $schema
is extremely variable, including implementations that ignore it entirely.
The reason for this is that the test suite makes it very clear that $schema
doesn't matter. There are no test cases for it at all. Once you get to 2019-09, there's one $vocabulary
test (that doesn't really do much AFAICT), and in 2020-12 it's used to enable format
assertion behavior, but that's optional anyway.
So, as currently designed, the test suite is undercutting an (apparent?) major project goal. It's very frustrating to see how little attention many implementations pay to $schema
, and it's not a mystery as to why that has happened.
If the JSON Schema org does not have a consensus that this is a major project goal, then we should open a discussion on that in the discussions repo, because there's no point in debating this detail without clear context on why we care about $schema
. The fact that the media type registration work hinges on $schema
for dialect identification suggests that the keyword is important.
If we do have consensus on this, then the topic of how to best test $schema
and $vocabulary
is another fairly large discussion. We would need cases where $schema
is absent, but testing implementation-defined conditions is a bit of a tricky thing (the spec could be better regarding testable requirements here, I'm a little disappointed in myself TBH- for example, I should have put some testable guardrails around "implementation-defined" like "MUST NOT crash" or "MAY refuse to process the schema" or "MUST try to process the schema" etc).
But the minimum first step would be to get the implementation-defined condition out of all of the other test cases. The test suite should rely on well-specified functionality ($schema
+$vocabulary
determines the processing rules for 2019-09 and later) rather than on behavior that is outside of the specification entirely (relying on an external selection of processing rules).
I would prefer to extend this to all drafts, as it would significantly help normalize $schema
support and encourage implementations to take it seriously. However, I would be reluctantly OK with leaving draft-07 and earlier as-is because the spec was written differently for those drafts. The behavior without $schema
is not clearly specified, but it's strongly implied to be a common occurrence and therefore handled reasonably. However, that's intentionally no longer the case in 2019-09 and later.
I'm responding quickly not to say I've already understood what you wrote (which yeah thanks! I'll read it carefully) but just because:
as of 2019-09 [...] the absence of $schema in the document root was explicitly stated to result in undefined behavior.
If this is true, you already convinced me. Can you or someone point me at where this is?
The reason for this is that the test suite makes it very clear that $schema doesn't matter. There are no test cases for it at all.
I'll just state on the record that I do not intend the test suite to say this, and if we don't have it, it's purely an "accident" of lack of effort until now on my part, something I clearly want to fix.
Thanks, @Julian
From §8.1.1 The "$schema" keyword:
The "$schema" keyword SHOULD be used in the document root schema object, and MAY be used in the root schema objects of embedded schema resources. It MUST NOT appear in non-resource root schema objects. If absent from the document root schema, the resulting behavior is implementation-defined_.
There's slightly more clear guidance around unknown $vocabulary
values, although of course that requires following $schema
to look at $vocabulary
:
§8.1.2.1 Default vocabularies:
If "$vocabulary" is absent, an implementation MAY determine behavior based on the meta-schema if it is recognized from the URI value of the referring schema's "$schema" keyword. This is how behavior (such as Hyper-Schema usage) has been recognized prior to the existence of vocabularies.
If the meta-schema, as referenced by the schema, is not recognized, or is missing, then the behavior is implementation-defined. If the implementation proceeds with processing the schema, it MUST assume the use of the core vocabulary. If the implementation is built for a specific purpose, then it SHOULD assume the use of all of the most relevant vocabularies for that purpose.
For example, an implementation that is a validator SHOULD assume the use of all vocabularies in this specification and the companion Validation specification
Even if we take that last paragraph as guidance, it doesn't actually say what draft of the vocabularies in question should be used.
The 2nd paragraph even notes that an implementation might refuse to process the schema. So there's an implicit allowance that a conforming implementation could outright refuse to the run the test suite. That might not be something that we want to imply, but that's what it says. IIRC, I did not want to require implementations to process an unknown thing as that could be a security risk.
Regarding:
'll just state on the record that I do not intend the test suite to say this, and if we don't have it, it's purely an "accident" of lack of effort until now on my part, something I clearly want to fix.
Oh, I did not mean to imply that you did! I apologize, I should have worded that better. I'm trying to work on my tendency to sound accusatory on this sort of thing.
I also think it was not unreasonable given the history. When I went to reply to this issue, at first I was confused over why we hadn't been using $schema
until I went back and read draft-04 and draft-07 and realized that what was in 2019-09 and later was actually a change. Prior to that, the spec was pretty casual about $schema
as well.
My reading skills (of the spec) apparently leave what to be desired, since I figured you were referring to that paragraph, but I have up until this moment probably never internalized the last half of that paragraph, i.e.:
If absent from the document root schema, the resulting behavior is implementation-defined_.
and was always focused on "yeah OK SHOULD, but doesn't need to".
That's quite clear, so I'm now very much +1 on the change, and indeed as you say now even when we do now have tests without $schema
, they need to go in optional
even.
Oh, I did not mean to imply that you did! I apologize, I should have worded that better. I'm trying to work on my tendency to sound accusatory on this sort of thing.
Don't worry about it :) I'll very much assume good intent (at least until I look silly doing so :D).
For my own complete clarity, and thanks again for bearing with me until now, I want to also cite something from json-schema-org/json-schema-spec#439 (which I intend to get back to and merge in the next few days), which covers why that section means an immediate +1, which is:
additional tests MUST NOT ever fail for an implementation that is correct under the specification
whereas it's now clear that a correct implementation, under
If absent from the document root schema, the resulting behavior is implementation-defined_.
may perfectly well have decided that their "implementation-defined behavior" is "blow up and error" (EDIT: see the hidden comments below, there may be disagreement about this precise "implementation-defined behavior, but the point stands for some other behavior).
So yes, anyone is welcome to do this (on all drafts, or ones with that language, I have no strong opinion since there's going to be "boilerplate" anyhow in most places now). If no one gets to it sometime soon, I'll get to it myself, probably after json-schema-org/json-schema-spec#439 and a few other issues are closed first.
may perfectly well have decided that their "implementation-defined behavior" is "blow up and error".
This effect is obviously unreasonable (I don't think this was intended), and the topic of https://github.com/json-schema-org/community/issues/189
I'll follow that ticket (which I of course haven't read yet), but fair enough, then substitute whatever other behavior is incompatible with the current suite, say choosing to default to a fixed version of the specification.
Erroring without a specified metaschema/dialect is entirely reasonable. My implementation switched to it; you have to either have $schema in your schema, or separately specify the metaschema to instantiate the schema as. {"type": "string"}
without any other information is an error. @jdesrosiers says the same about their implementation in https://github.com/json-schema-org/community/issues/189 - I am not up to date on that issue (it is on my to-catch-up-on list) but I do not think that calling an error result 'obviously unreasonable' has any consensus there.
My implementation passes the test suite, falling back to specifying the metaschema according to the directory name (as I suppose most implementations do here). Using the mechanism of $schema would be better, in my opinion.
without any other information is an error.
Implementations have to solve a wide variety of problems, and the text is in parts contradictory, so I will stop short of saying this is wrong. But I think it is unreasonable, $schema and the meta-schemas are not mandatory. They're something that the schema author can provide to enhance functionality, so forcing someone who is not the author of the schema to select a meta-schema defeats the purpose of the meta-schema functionality.
And this seems to contribute to a misunderstanding about how JSON Schema is versioned... the specification text has versions; meta-schemas can have versions (as we change them over time to bring them into alignment with what the spec says, or to add functionality); but JSON Schema the media type is unversioned.
Let's move any discussion of what the set of valid implementation-defined behaviors are, and whether they include "error out", elsewhere (like the ticket linked) so that this ticket can just be about what the spec says, not what it should say --
@awwright are you disagreeing there (about what it seems to say)? Or are you making a point about what you'd like it to say?
(To me it seems very clear that what the spec says is that an implementation may choose to do something other than what the test suite assumes it does today). But speak up if you're disagreeing.
The effect on testing is: if you're changing the meta-schema being used for running the tests, based on the draft directory, then you're not testing for compatibility with schemas in the wild (where this information is unavailable).
Instead of relying on implementors to guess which one should be used for every folder
I thought it was implicit that the directory name for the tests would indicate which specification version to use for running those tests. If that's not clear enough, we should say so explicitly in a README. That is -- it's not okay for an implementation to just use whatever version it likes (i.e. its preferred default version) for all tests, as obviously the tests won't all pass if the versions don't match up, so what is the point? The different directories exist for the express purpose of demonstrating the different expected behaviours for each specification version.
The reason for this is that the test suite makes it very clear that $schema doesn't matter. There are no test cases for it at all. ... So, as currently designed, the test suite is undercutting an (apparent?) major project goal. It's very frustrating to see how little attention many implementations pay to $schema, and it's not a mystery as to why that has happened.
You are confusing intent with happenstance. It's not intentional that there are few to no tests involving $schema
-- it's just that no one has written them yet.
I'd been intending to write more, using a new metaschema that used the format-validation vocabulary, but there was some confusion and disagreement about the existing format tests and I ran out of spoons. We could certainly add others that omitted certain vocabularies to ensure implementations did the right thing, but someone has to do it, and we're (almost) all volunteers. Maybe that will get better with the increase in paid staff. But please don't infer the lack of certain things is signalling some sort of intent. Output formats matter too, and we have no tests for those either, because it's complicated and someone has to do the work in thinking about how it's all going to work (and then make the case to everyone else to get it merged).
I thought it was implicit that the directory name for the tests would indicate which specification version to use for running those tests.
The directory name indicates the draft that the schema was written to. It doesn't imply there's a way to pick the specification version you use for running a test. This is implied because if I see a JSON Schema in the wild, I would have no way of knowing which specification version it was for, it makes sense tests should work this way too.
Testing multiple versions of a validator would be like saying "Should I test this JSON parser with RFC 4627, RFC 7159, or RFC 8259?" That is: while there's different versions of the JSON specification, there's only one version of JSON, and implementations are expected to be compatible with documents written for any of the RFCs, even when we don't know which one.
If I wrote a validator by comparing the input to the known tests, and returning the expected result, it would pass all the tests, but it would quickly fail against instances in the wild. Likewise, if you're changing the behavior of the validator depending on the directory it was found in, that's a little bit cheating the tests.
Adding $schema to the tests would be a slight variation on this, since $schema is a pretty good heuristic of specification version (this was intentional), but not necessarily:
{ "$schema": "https://json-schema.org/draft-03/schema"
, "$ref": "https://json-schema.org/draft-07/schema#"
, "type": "object"
}
... This is a schema that uses the draft-03 vocabulary to match all valid object-form draft-07 schemas (no boolean schemas). (Yes, according to the JSON Schema text, I would expect it to work this way!)
What we have today is not a heuristic and not cheating :), it's explicitly the test suite telling anyone using it "stuff in this folder is written to be run against the draft whose folder it's in". There's no ambiguity, that's the current public interface of this suite itself. There are indeed suites which test different RFC implementations of e.g. URIs, even though from staring at one you don't know what version it's written for. Such a thing doesn't sound like an alien concept to me for JSON either, and I wouldn't be surprised if they existed, heck they'd be useful to us even if we ever had a "format" :
"json"` and explicitly said which RFC was intended (as we do for regexes or datetimes or whatever).
But it seems we're already all agreed (well, at least unless @karenetheridge says they disagree) to add $schema
simply because the spec doesn't require that implementations respect or support being told what version to use when validating a schema with no $schema
nor does it constrain what they do in that case at all. Unless someone disagrees with the next step here (add $schema
to tests, and then perhaps add additional tests as per json-schema-org/json-schema-spec#210), I'd recommend we probably leave this here. Pathological examples like the one you just mentioned may indeed be interesting at some point but seem low priority to me. But file them as extra test tickets, why not.
it's explicitly the test suite telling anyone using it "stuff in this folder is written to be run against the draft whose folder it's in".
Even if this was the intention, there's no such thing as "running against a draft" (any more than I can parse JSON "against RFC 4627"). Sure, we can talk about how JSON was defined as of a specific publication. But this does not imply that a JSON parser (or a JSON Schema validator) should toggle between RFCs/publications/versions.
Could you speak more specifically to this argument (there's normally no way to distinguish drafts and the tests shouldn't invent this parameter)?
Unless someone disagrees with the next step here (add $schema to tests
What do you mean exactly? We can add a file that tests the $schema keyword in various combinations. But the bulk of the tests should omit it.
@karenetheridge I was not ascribing intent to the authors of the test suite, and please note my apology to @Julian yesterday for not making that 100% clear, and allow me to extend it to you and everyone else who has worked on the test suite. It was an observation of the effect, regardless of the intent.
The different directories exist for the express purpose of demonstrating the different expected behaviours for each specification version.
Up to draft-07, the requirements around $schema
were fairly weak, so it was necessary to have an external signifier. But that's no longer really the case. The separate directories still make sense purely on organizational grounds, but $schema
should be what really does the work.
there's no such thing as "running against a draft" (any more than I can parse JSON "against RFC 4627"). Sure, we can talk about how JSON was defined as of a specific publication. - @awwright
This isn't true. Each version/draft of JSON Schema adds new functionality. As you pointed out with your draft-3/draft-7 example, boolean schemas aren't a thing in draft 3, but they are in draft 7. Similarly, drafts 6+ no longer recognize id
as a keyword. Therefore it doesn't make sense to have a generic "JSON Schema" validator because the idea of JSON Schema is ever-evolving. A validator must know against what stage of that evolution it's supposed to perform its validation.
Regardless, I think that discussion, while it may feed into this one, should be handled elsewhere.
All of that said, while my validator does allow the user to define what draft a non-draft-specific schema should be processed in, making this test suite possible, I'm on the side of including $schema
in the tests because:
$schema
-less schemas is undefined, meaning implementations can do whatever$schema
-less schemas may not be indicative of the real world$schema
should be respectedEven if this was the intention, there's no such thing as "running against a draft" (any more than I can parse JSON "against RFC 4627"). Sure, we can talk about how JSON was defined as of a specific publication. But this does not imply that a JSON parser (or a JSON Schema validator) should toggle between RFCs/publications/versions.
@awwright you have a different view on how this does or should work than most of the other people in the project, which is a point of discussion at ietf-wg-httpapi/mediatypes#20 among other places. That discussion should be had and settled elsewhere before we make decisions one way or another based on any of our views.
@Julian , it occurred to me to check the validation spec, and I see that it says:
The current URI for the default JSON Schema dialect meta-schema is https://json-schema.org/draft/2020-12/schema. For schema author convenience, this meta-schema describes a dialect consisting of all vocabularies defined in this specification and the JSON Schema Core specification, as well as two former keywords which are reserved for a transitional period. Individual vocabulary and vocabulary meta-schema URIs are given for each section below. Certain vocabularies are optional to support, which is explained in detail in the relevant sections.
It's possible to read that as "if your implementation is a validator, in the absence of $schema
it SHOULD assume this meta-schema and associated vocabularies and behaviors." But it doesn't actually state a clear requirement like that. I double-checked the core spec, and it does not mention default meta-schemas.
I vaguely recall that I wanted to draw a sharper line between a generalized extensible JSON Schema implementation (where it would not make sense to default $schema
) vs specialized ones, which would deem most "JSON Schema" implementations that currently exist to be specialized for validation.
It's pretty wobbly reasoning, and definitely not covered by proper normative language, but I could see interpreting this to indicate that the test suite, as a test suite for validators, implies a default $schema
.
I would prefer to take a more general view, in hopes that the test suite will broaden its scope beyond validation, e.g. to annotation collection, although of course annotation collection is not mutually exclusive with validation and usually relies on it, so... ¯\(ツ)/¯
Anyway, I have $schema
added locally for 2020-12 although I need to write a little script to sanity-check it. But I can post a PR if we do decide we want to do that.
I don't see any practical effect from adding $schema
everywhere, unless you also intend to collapse all the tests together into a single directory where it is impossible to tell which version of the spec it's written for without examining the $schema keyword. Everyone uses the appropriately-named directory/ies when testing their implementation already. Absolutely we should have more tests of the $schema keyword, but adding the keyword to all test cases isn't going to add any value.
Instead of adding the keyword everywhere, I think the README should be strengthened to clearly indicate that it is intended that each directory's tests are to be run against an implementation that is configured to use that specification version, and skipped if that version is not supported (there is no sense in me running the draft4 tests against my implementation, as I know there will be failures and I don't claim to support that version).
Even if this was the intention, there's no such thing as "running against a draft" (any more than I can parse JSON "against RFC 4627")
I disagree, and it seems there's context here where some of this discussion has occurred, but let's focus on the things which are relevant to this ticket, which concerns simply whether we add $schema
to all tests, as it seems there's other discussions that Henry has linked for this part of the discussion.
It does seem that now both you and @karenetheridge are indeed disagreeing with the intended next steps here. Can one or both of you explain how you read the specification then? The section @handrews linked seems clear, so it seems the burden should be on you to explain it some other way.
Specifically, I hypothetically have written an implementation which can accept schemas written either for draft 7 semantics or draft2020 semantics. This hypothetical implementation will return "true", i.e. valid, for all instances, when presented with a schema which does not have $schema
declared in it. Can you point to where in the specification says I may not do so?
Henry has pointed to:
If absent from the document root schema, the resulting behavior is implementation-defined_.
which to me seems to say very clearly that I may indeed decide to do so. And, if I may do so, I will have difficulty running the test suite under my implementation, because around half of the tests in the draft2020 folder will fail when run against my draft2020 validator!
So in short to me, the spec seems clear, and effectively means we MUST add $schema
to all schemas, otherwise an implementation such as the above would not be able to use it. Yes, we then may add optional tests which say one can have an implementation which indeed operates identically without $schema
, but the question doesn't seem to be about clarity, the spec literally seems to allow compliant implementations which cannot today run the suite.
What have I missed?
(n.b. we definitely should not collapse the suite into one giant file or directory, so that's not one of the options anyone's proposing I believe.)
Each version/draft of JSON Schema adds new functionality
I was using JSON for simplicity; but my point holds for standards that add features over time. Consider ECMAScript: There's no standardized way to specify which ECMA publication is used for running code... You get whatever your particular version of V8 implemented, which this is usually sufficient regardless of the version of ECMA you wrote your code to.
As you pointed out with your draft-3/draft-7 example, boolean schemas aren't a thing in draft 3, but they are in draft 7.
I was trying to call attention to how the "type" keyword would have been ignored according to the draft-03 I-D, but this is no longer the case, so even though I'm specifying a draft-3 $schema, the newer behavior applies.
Therefore it doesn't make sense to have a generic "JSON Schema" validator because the idea of JSON Schema is ever-evolving.
Can you detail how this follows? For example, ECMAScript, or CSS have no version identifiers, but they add functionality with each new release.
I'll respond with another question: where in the specification does it say that "$schema": "https://json-schema.org/draft/2020-12/schema"
should be used to infer draft2020-12 semantics, "$schema": "https://json-schema.org/draft/2019-09/schema"
for draft2019-09 semantics, and so on? [I was surprised too.]
My conclusion is: we have no way of indicating whatsoever what version of the specification we intend the implementation to use when running tests. Adding a $schema keyword won't make a whit of difference there. If we want a particular version of the spec to be used, we have to say so out of band.
We should fix the spec to make this more clear: as I've described a few times before, I've assumed the algorithm that $schema keywords are followed sequentially until one gets to one where the $id matches the $schema, and then it's either a URI the implementation recognizes, in which case the spec version is derived from that, or it's not, and evaluation fails/dies. But the spec doesn't actually say any of this. The spec assumes that its version is the only one in existence at any time, and its publication has superceded all previous publications. This is in conflict with the test suite that very clearly acknowledges that multiple versions of the spec exist, and that implementations can support any number of them. Therefore: the test suite needs to be explicit about the intent for the tests in each directory.
but this is no longer the case, so even though I'm specifying a draft-3 $schema, the newer behavior applies.
This is perhaps reason not to add your example, since I think you're asserting something entirely non-obvious, I'd have thought most people would assume precisely the opposite, but given this isn't something covered in a specification, we may just agree to disagree.
I'll ask again though for you to please stay on-topic and leave the other discussions for tickets dedicated to them.
Please don't continue the
Can you detail how this follows? For example, ECMAScript, or CSS have no version identifiers, but they add functionality with each new release.
trail of thought in this ticket. It sounds quite interesting, but isn't for here.
(In fact I'm hiding this comment itself as well).
@karenetheridge
I don't see any practical effect from adding
$schema
everywhere, unless you also intend to collapse all the tests together into a single directory where it is impossible to tell which version of the spec it's written for without examining the$schema
keyword.
I'm not sure what you mean by "practical effect." The important thing here is testing the specification. How the tests are organized and configured is a secondary concern. It doesn't matter whether it's possible to figure the processing semantics out from outside of the specification (e.g. the directory structure) or not. If it is both possible to figure out the right processing semantics from within the spec, and a requirement that implementations do so correctly, then that is what we should test.
Perhaps the most important argument is that it's possible to write a conforming implementation that only uses $schema
to determine the processing rules. I don't know of any that effectively require it (refusing to process without it) but it would be valid to do that. Regardless, a validator should only need to be explicitly configured with a draft if it can't figure things out from $schema
.
Our test suite should properly tests validators that actually rely on $schema
without forcing them to pre-configure the draft beforehand. Pre-configuring is a convenience workaround outside of the specification's requirements, and it's the specification we should test.
There's also, in my view, no real reason not to do it. Adding $schema
does not interfere. It is at most a mild inconvenience while writing tests, as one usually isn't writing huge numbers of schema objects at once.
I just added it to all of 2020-12's test cases, by hand, in vi, no scripting, no IDE, no nothing. It took a few slightly tedious minutes, but I needed to stop staring at another problem so that was convenient. I'm happy to do the others. I understand @Julian's concerns over mass changes, and am also happy to do whatever verification work is needed to make everyone comfortable with it (I'm writing a little script that goes through and checks for missing or mismatched $schema
URIs and will add the output to any PRs).
Finally, there is clearly a larger discussion to be had regarding how we expect implementations to process schemas. In my experience, people use $schema
when they are sending schemas out to be consumed by others, because you never want to rely on people reading documentation to get the settings right. Keeping the configuration inside the system is far more reliable.
I believe that we should orient the test suite towards real-world usage rather than convenience for test authors. Particularly when the inconvenience is not that substantial.
@karenetheridge
where in the specification does it say that
"$schema": "https://json-schema.org/draft/2020-12/schema"
should be used to infer draft 2020-12 semantics
From §8.1.1 The "$schema" keyword:
The "$schema" keyword is both used as a JSON Schema dialect identifier
I think this pretty clear. It identifies the dialect, the dialect identifies the semantics / processing rules. I will grant you that it is not detailed, and could use some "MUST" etc., but there's a lot of other material in the spec that reinforces this.
From §3 Overview:
A dialect is defined as a set of vocabularies and their required support identified in a meta-schema.
Plus the entirety of $vocabulary
and the rules explained there. It's normatively-described functionality depends on and supports $schema
being respected to determine the processing rules.
From §9.3.2 Differing and Default Dialects (under §9.3 Compound documents):
When multiple schema resources are present in a single document, schema resources which do not define with which dialect they should be processed MUST be processed with the same dialect as the enclosing resource.
Since any schema that can be referenced can also be embedded, embedded schema resources MAY specify different processing dialects using the "$schema" values from their enclosing resource.
This is pretty clear that $schema
signals the processing rules per resource, as that's the only mechanism given to detect the change for an embedded resource. The "MAY" here is about not requiring $schema
to redeclare the same processing rules all the time, it's not about it being optional to respect $schema
in embedded resources.
All of this is about resources setting their own processing rules, not someone externally choosing them.
Adding $schema
to every test schema where appropriate feels totally reasonable to me.
Mild inconvenicne, huge gain.
We can verify that such a sweeping change doesn't screw anything up.
(It would be interesting to see if this impacts any of the major implementations before merging any such PR, just out of cauction.)
I'm happy to download the change and run it on mine. Adding $schema
will override the configuration I currently have to set, so it should give the desired effect.
I'll post a PR in the next day or two. Got a bit distracted by some other stuff.
The original question for this issue was, "Do we need $schema
in the test suite cases?" not "Should $schema
be required?"
After (lengthy) discussions in Slack and some time to think, I've concluded that no, the test cases don't need $schema
*.
Consider the schema {"type": "string"}
included in the "draft 7" folder. My original point was that there is nothing stopping an implementation from running this as a 2019-09 schema.
But that doesn't matter as long as the result is correct. My question assumes that implementations will have "Draft X Compability" modes, but that's not the case, and it's not required.
Basically, if my validator is given that schema and ["array"]
as an instance, and it returns invalid, then it's compliant for draft 7 for this scenario. It's also compliant (for this scenario) with drafts 3 through 2020-12.
If it's compliant for all test cases within a single draft folder, then it's said to be compliant with that draft.
None of this is dependent upon $schema
being present.
If, in some future version, we change the semantics of type
so that the above scenario is valid, then my implementation would not be compliant with that version. But again $schema
doesn't need to be present in order for this to be determined. The fact that the test case expects a "valid" result and doesn't get one determines that the implementation is non-compliant for this future version.
The test suite isn't asking, "Can the implementation process a schema with meta-schema X?" (Maybe that's a question it should ask, but if so it should do it as distinct test cases.) It's asking, "Can the implementation give me the desired result as indicated by the specification?" That I have to configure my implementation to properly handle some test cases is outside the scope of the test suite. Secondarily, it's my responsibility to document that such a configuration needs to be performed before processing some cases.
Because the 2019-09 and 2020-12 spec allow for implementations to refuse to process schemas without the $schema
keyword, it's conceivable that, in the test suite's current state, an implementation could refuse to process the entirety of these draft folders and still claim compliance with these drafts.
This needs to be fixed by adding $schema
to the schemas in these folders. The other drafts don't need it for the reasoning above.
We will continue discussion on "expected behavior when $schema
is missing for future versions" elsewhere.
@gregsdennis I was only planning to do 2019-09 and 2020-12 in the PR, so I think we're good here. I had noted earlier that there is no language in draft-07 or earlier that says anything at all about what happens without $schema
, and therefore I did not think it was strictly necessary to add it. While I would do that if it were just me, I'm fine with leaving it as-is for those drafts.
In the wider discussion, we're conflating a few more issues than we realised (I think).
We have "compliance" and we have "support".
You can be "compliant" with a version of the spec while not "supporting" said version of the spec.
Currently, for 2019-09 and 2020-12, based on the discussion, we need to add $schema
to those folders in order to test for support (As far as I can tell, we agree on this). Compliance, strictly, isn't being fully tested, because, as we've disccussed, if $schema
is absent, all bets are off and anything goes (aka, implementation defined).
I'm going to strongly suggest that we have and ADR included in any PR that looks to resolve this issue, mostly to avoid us hashing this out again. I appreciate that requires more work, and additional sign off, but I feel that's worthwhile if we can avoid revisiting the discussions of this week!
I concur with https://github.com/json-schema-org/JSON-Schema-Test-Suite/issues/311#issuecomment-1163830104
However, if it's true that this effect was not intended, I think we should put this on hold and fix that first, since the resolution will affect this issue. https://github.com/json-schema-org/community/issues/189
@Relequestual Can you offer a definition for these terms? It would seem to me that schemas are "compliant" or not, whereas validators have a range of features that they "support" (or don't support).
@awwright
However, if it's true that this effect was not intended, I think we should put this on hold and fix that first, since the resolution will affect this issue.
Again, it was 100% intended that conforming implementations can refuse to process schemas that do not have $schema
. That was the intent, three years or more ago when I made that change, got it reviewed, and incorporated into the spec, where it has been published in three consecutive drafts.
The only part that was unintentional was that I forgot to forbid completely arbitrary behavior. But that doesn't impact the reasoning here. Fixing that doesn't change anything about the need for $schema
in 2019-09 and 2020-12. And I continue to believe that allowing refusal to process is correct.
Please stop treating the intended meaning of this clause as a bug. There are bugs related to it, but fundamentally it says what it was meant to say.
Please stop discussing. The change is accepted, it's the correct one, we can summarize the back and forth above in an ADR if helpful for documentation (I can do that). Henry send a PR whenever you can.
For example, the first test in the
const
file isThere is no
$schema
declaration. That means that this is a valid draft-6, -7, and -2019-09 schema.My implementation assumes latest, unless it can be determined that another one should be used (via keywords used or
$schema
declaration), but that may not be the case for others.This means that while we intend for this to be evaluated as draft 2019-09, it may not be, depending on how the implementation reads non-specific schemas.