Tests for format should make better use of vocabularies and optional/ is too overloaded with meaning

wasabii commented 3 years ago

The draft2019-09 and draft2020-12 directories contain various tests for 'format'. The ones located in format.json document the results of everything (even bad formats) as valid: true. This is because of the movement of format to annotations by default. That's fine. So, these tests should all return valid for these schema versions.

However, the tests in the optional/format directory are setup with 'valid: false'. As if the results of the tests should be an assertion. But there isn't any data that determines this distinction other than their location. When evaluating the files in optional/format, the test suite should consider format failures as assertions. When evaluating the tests outside of optional/format, it should consider format failures as annotations.

Makes it a bit hard to auto generate test implementations given the test suite JSON files.

It would be nice if there was some distinction in the test suite. Perhaps actually including $vocabulary: { "...format": true } in the schema node within the tests. Since this should be the thing that governs the behavior.

karenetheridge commented 3 years ago

In draft2019-09 this was a little fuzzy because we didn't have any in-schema way of stating that formats should be treated as assertions, so putting the tests in optional/formats was the best we could do to indicate "if your implementation treats these as assertions, these tests apply".

Now in draft2020-12 we have a mechanism, via $schema and $vocabulary, to assert this, so we need to move (or copy?) these tests into the main area and write a custom metaschema for them. That simply hasn't happened yet due to lack of tuits (indeed you will see that we have no tests at all yet of the $vocabulary keyword, or the $schema keyword using anything but the draft metaschema for each version).

wasabii commented 3 years ago

Hmm. Makes sense. How about a value in the test json files, along side "valid", for enabling annotations as assertions? Since the specification does define two possible ways to configure annotations as assertions: $vocabulary, and "whatever configuration method the implementation wants"; this new property could be referring to the later.

Example, very bad one:

    {
        "description": "validation of date strings",
        "schema": {"format": "date"},
        "format-mode":"assertion",
        "tests": [
...

wasabii commented 3 years ago

To quote from the spec:

   Regardless of the boolean value of the vocabulary declaration, an
   implementation that can evaluate "format" as an *assertion MUST
   provide options to enable and disable such evaluation*.  The assertion
   evaluation behavior when the option is not explicitly specified
   depends on the vocabulary declaration's boolean value.

So, some new in-test value can signal to implementations to enable this required feature, outside of the usage of in-schema vocabulary settings.

jdesrosiers commented 3 years ago

Since the specification does define two possible ways to configure annotations as assertions

I'd have to check the spec to make sure we didn't miss something, but the intention isn't that there are two ways to indicate whether format is an assertion or an annotation. Each draft should only have one method. Those methods are just different between the two drafts.

we need to move (or copy?) these tests into the main area and write a custom metaschema for them

I don't think it's required to support format as assertions, so I think it still belongs in "optional".

karenetheridge commented 3 years ago

Implementing support for the Format-Assertion vocabulary is OPTIONAL.

(https://json-schema.org/draft/2020-12/json-schema-validation.html#rfc.section.7.1)

Implementations MAY still treat "format" as an assertion in addition to an annotation and attempt to validate the value's conformance to the specified semantics. The implementation MUST provide options to enable and disable such evaluation and MUST be disabled by default. Implementations SHOULD document their level of support for such validation.

(https://json-schema.org/draft/2020-12/json-schema-validation.html#rfc.section.7.2.1)

Relequestual commented 3 years ago

CREF2 in the validation spec is the following quote, located at https://json-schema.org/draft/2020-12/json-schema-validation.html#rfc.section.7.2.1

Specifying the Format-Annotation vocabulary and enabling validation in an implementation should not be viewed as being equivalent to specifying the Format-Assertion vocabulary since implementations are not required to provide full validation support when the Format-Assertion vocabulary is not specified.

The intent of saying that an implementation MAY still treat "format" as an assertion is in line with anntoation processing. We're making a suggestion on the likely use of that annotation, additionally specifying that this must not be the default behaviour to keep interoperability for regular users.

The validation that is to be done when "format" is an annotation but treated as an assertion is implementation defined.

As such, we should not provide any tests for such possible configuration modes.

The optional tests we provide for "format" should only be those associated with the "format-assertion vocabulary" and not the "format-annotation vocabulary".

I agree, we could make this a little clearer.

I'd like to keep this open so we can reference it in relation issues we should consider when creating new test structures.

jdesrosiers commented 3 years ago

@karenetheridge, thanks for looking that up. Did we leave the implementation defined option in place on purpose, or did that slip through the cracks? I thought the point of splitting "format" into two vocabs was to get rid of this special-case vocabulary behavior.

gregsdennis commented 3 years ago

It was definitely on purpose. There was a concern about backward compatibility for implementations that supported validating format.

For reference, here are the PRs that updated that section. There was a lot of discussion, especially in the closed ones.

https://github.com/json-schema-org/json-schema-spec/pull/1023 https://github.com/json-schema-org/json-schema-spec/pull/1026 https://github.com/json-schema-org/json-schema-spec/pull/1027 (final version, merged)

jdesrosiers commented 3 years ago

Rereading the relevant issues and PRs, it seems we decided this exactly how I remembered, but it was written up differently and we just went with it instead of going back to discussions. https://github.com/json-schema-org/json-schema-spec/pull/1027#discussion_r528251460

I'm not mad at how it ended up, just surprised. I'll have to update the release notes, because I wrote that the configuration option was removed.

Relequestual commented 3 years ago

Yikes. I'll have to go read.

gregsdennis commented 3 years ago

@jdesrosiers I'm not sure what you mean. How was it written up differently to what we decided? The comment you linked to has follow-up conversation that we should leave it as is for historical reasons.

jdesrosiers commented 3 years ago

@gregsdennis I'm sorry I brought it up. It's really not important. I'm fine with how it is and I'm not even arguing for a change. I could quote a bunch of posts from the issue and PR to show what I see, but I feel like that would just fuel conflict for no benefit.

The way it reads to me is that Henry gave historical context for why there was a config option in the first place, but left the decision up to us. No one favored supporting partial format implementation other than what is reasonable due to limitations of the programming language. Darrel Miller's comments seemed to sway opinion. Ben proposed a path forward that didn't include a config option. There was agreement and a PR was written. Ben comments several more times in the issue and PR that the proposal should not include a config option. I don't see anywhere where an amendment to the proposal was discussed other than the brief exchange I linked to above. Maybe it happened in slack. Maybe I was even part of that discussion and don't remember. I'm not sure.

Ultimately, the only problem here is with me not paying enough attention and not knowing what was actually in the spec. That's been corrected and I don't wish to burn any more of anyone's time on this.

gregsdennis commented 3 years ago

This table (from a comment in the 1027 PR) sums up the behavior that should be described in the text.

Annotation Vocab	Assertion Vocab	Impl Assert Support	App Config	Result
false/true	n/a	no	n/a	annotation
false/true	n/a	yes	annotation	annotation
false/true	n/a	yes	assertion	assertion
n/a	false	no	n/a	annotation
n/a	false	yes	annotation/assertion	assertion
n/a	true	no	n/a	fail
n/a	true	yes	annotation/assertion	assertion

There are four degrees of freedom here:

which vocab is used (annotation or assertion)
whether the vocab is used with a true or false value
whether the implementation even supports assertions
whether the implementation is configured to provide assertions

json-schema-org / JSON-Schema-Test-Suite

Tests for format should make better use of vocabularies and optional/ is too overloaded with meaning #495