Restructuring of output formatting

gregsdennis commented 4 years ago

While writing Section 10 on output, I was implementing the formatting in my library, Manatee.Json. In that library, I don't collect annotations in the same way as described in the spec. As a result the annotation output of a successful validation was not fully explored.

I have recently been working on a new validator that does follow the recommendations of the spec in regard to annotation collection and have reached the same questions that many other implementors have asked me regarding annotations in the output format, to which I naively responded that annotation output is analogous to error output.

I aim to fix this. This issue is a compilation of questions and issues I've experienced trying to reimplement output formatting as well as potential solutions to some of them.

1. Annotations vs Nested Results

With the current output, annotations are to be represented as their own output unit within the annotations collection, including the valid property and all JSON pointers indicating where the annotation came from and to what part of the instance it applies. Additionally, any nested results were also to be in this array. This conflates annotations with nested results and the only way to identify which were annotations and which were nested results was via the location properties.

To resolve this, we should split these collections. Nested results should remain as they are, but the property name needs to change.

The proposed fix adds (or hijacks, rather) a single property, annotations to the output unit. It's only present for valid = true and when there are annotations (no empty arrays). The value is an array of all annotations for this node.

For instance, properties requires that it adds the names of any validated properties as annotations. To this end, the annotations on the properties node will be an array of these property names.

{
    "valid": true,
    "keywordLocation": "#/properties",
    "instanceLocation": "#",
    "annotation": [
        "foo",
        "bar"
    ]
}

I'm not sure how this fits in with collecting unknown keywords as annotations since that would effectively require a node for each unknown keyword, but maybe that's fine (the location pointer would indicate the keyword, and the annotation value would be the unknown keyword's value).

This idea also may invert the intent behind dropping combining annotations, but I haven't thought through that completely, yet.

This is also in line with @ssilverman's comment regarding how his implementation handles annotations internally. (My new validator uses a similar mechanism.)

2. Nested Nodes

Having the nested result nodes under annotations for validated results and errors for invalidated results was, admittedly, me trying to be clever about things. Practically, it's difficult to deal with.

The proposal here is to have these always under a consistent name, e.g. nested, results, etc. I'm not too fussed about what the name is, but it should always be that regardless of the value of valid.

3. Too Many ~Secrets~ Nodes

In trying to pare down the list of nodes for the detailed and basic formats, I found that a lot of nodes were kept that may not be necessary.

For example, in an error scenario, is it necessary to list passing subschemas of an allOf? To a person just trying to figure out what's wrong, these are just noise to be filtered out in order to pinpoint the one subschema that failed.

The algorithm for reducing returned values could be adjusted, and maybe needs to be dynamic based on the scenario. Maybe if a schema is valid, I don't care about any of the details and I just want a ✔️. But for failures I want to know only those nodes that failed, but I want detailed info about them, and maybe in a format that mimicks the schema (as detailed is intended to do). The current formats (and the algorithms to generate them) do not support this.

(Fake internet points for anyone who gets my reference in the title of this section.)

jdesrosiers commented 4 years ago

Recently, we worked on formalizing the process for embedding schemas of different drafts than the parent schema. If the output format is changed, how would output for embedded schemas work? Currently, it's not a problem to just use 2019-09 for everything because none of the other drafts specify an output format.

I think it should definitely not change output format mid-validation. The output format of the top level schema should be used for the whole validation. I don't think this is controversial, but it is a detail that we'll have to address as we make changes to the output format.

jdesrosiers commented 4 years ago

Not sure if I should be piling on here or creating another issue, but this seems like a good time to bring up something that bothers me about the output format.

Currently keywordLocation is described as a required properties (technically a SHOULD, but described as required) while absoluteKeywordLocation is described as optional. I think this is backwards. Since draft 2019-09, the value of a schema with a JSON Pointer that crosses a document boundary is undefined. That means that keywordLocation as defined doesn't properly identity a keyword.

I can see the value of this keyword for human debugging, so I'm not proposing throwing it out entirely, but it should at least be optional and not a URI (just a pointer).

In my implementation, I don't include keywordLocation in the output. Because it's not a true identifier, it would actually be fairly difficult (awkward and messy at the least) for me to implement.

jdesrosiers commented 4 years ago

In trying to pare down the list of nodes for the detailed and basic formats, I found that a lot of nodes were kept that may not be necessary.

I don't think the output format should trim anything. Nodes that are superfluous in one context might be helpful in another. Producing human readable output should be a process that takes the output as an argument and transforms it into something nice. https://github.com/atlassian/better-ajv-errors is an example of this concept except it uses the ajv proprietary output format rather than the standard one. I'm hoping they will come out with a version that supports the standard output format so I don't have to make my own.

If we trim the output results to improve one way of consuming the output, we might be limiting what can be done for other ways of consuming output. For example, imagine a regex101.com -like online schema validator. It might find some of those "redundant" nodes useful to build visualizations where they are just noise for text-based human reporting.

Therefore, I don't think the spec should be defining an algorithm to reduce returned nodes. We can leave that to third party tools to decide what works in their domain. It's definitely a problem we as implementors need to work on solving, but not at the spec level.

gregsdennis commented 4 years ago

Currently keywordLocation is described as a required properties (technically a SHOULD, but described as required) while absoluteKeywordLocation is described as optional. - @jdesrosiers

This is correct as is. You will always have a location relative to the root schema, which is what keywordLocation gives. However, you will only have an absolute location if the root schema defines an absolute URI for $id. Since $id is optional in the schema, it makes sense for absoluteKeywordLocation to be optional in the output.

Additionally, keywordLocation gives you how you got to the keyword, including dereferences, which is generally more useful for debugging than just where the keyword is, which is what absoluteKeywordLocation gives you.

If we trim the output results to improve one way of consuming the output, we might be limiting what can be done for other ways of consuming output. - @jdesrosiers

This is why we're discussing it. A lot of times, you just want the problem. Other times, you'd like the details. We need to isolate the rules around these scenarios. @ssilverman has some logic that he's implemented in his library already that may serve as a starting point.

karenetheridge commented 4 years ago

Additionally, keywordLocation gives you how you got to the keyword, including dereferences, which is generally more useful for debugging than just where the keyword is, which is what absoluteKeywordLocation gives you.

In my implementation, I omit absoluteKeywordLocation from the error/annotation output iff it is identical to keywordLocation (that is, if no $ref was traversed while getting to this location, AND there is no canonical URI provided for the schema) -- I agree that it is helpful to have both pieces of information.

jdesrosiers commented 4 years ago

If we trim the output results to improve one way of consuming the output, we might be limiting what can be done for other ways of consuming output.

This is why we're discussing it.

I think you're missing the point. I'm saying that the output format should be like an assembly language. It's a standardized, low level representation of the validation result. Starting from that assembly language, people can build whatever they need that is appropriate for their domain. Since it's impossible for us the predict every way someone might want to use the output format, I think this is the only feasible path.

You will always have a location relative to the root schema, which is what keywordLocation gives. However, you will only have an absolute location if the root schema defines an absolute URI for $id.

This sounds like a great argument for why an id should not be optional (not necessarily with $id). Otherwise, the output format has no way to properly identify the things it's describing. It requires a human to connect the dots and therefore limits the information I can get from the output programatically. The project I'm working on now would not be possible without an identifier for every schema. The output format should be optimized for machine readability, not human readability. The human readable version can be generated from the machine readable version. (In my implementation I designed the API such that I always have an identifier. The only way to use a schema is to use its identifier. So, I have no issue with leaving out keywordLocation because I always have an identifier.)

I'd love to see keywordLocation optional or even removed, but I'd be happy with changing it to require at least one of keywordLocation or absoluteKeywordLocation rather than require just keywordLocation.

gregsdennis commented 4 years ago

The project I'm working on now would not be possible without an identifier for every schema

That your scenario requires an ID doesn't imply that every scenario will. Relative location has its importance in how you got to the keyword, not just where the keyword is. This is why keywordLocation is required.

I think you're missing the point.

I'm just saying that it needs to be discussed. I'm open to discussion, and some of your points are valid. Let's please keep the tone of this conversation open.

If you have structural suggestions, please share them. Also keep in mind the months of conversational background that led to the current state. There are requirements that we were dealing with that your use case may not have.

It's a standardized, low level representation of the validation result.

This is precisely the approach we took. What is now in the spec evolved from that.

karenetheridge commented 4 years ago

For example, in an error scenario, is it necessary to list passing subschemas of an allOf?

Why is allOf producing any annotations at all? the spec says: "[These applicator keywords] have no direct impact on annotation collection, although they enable the same annotation keyword to be applied to an instance location with different values. Annotation keywords define their own rules for combining such values." My interpretation of this is that allOf creates no annotations - it merely passes along the annotations from the (passing) subschemas that it contains.

PS. @gregdennis in your original problem description above, can you clarify which output format you are discussing?

jdesrosiers commented 4 years ago

That your scenario requires an ID doesn't imply that every scenario will.

The point is that the output format needs to be a common denominator for any application that someone might want to use it for. If my use case needs it, others will too. If the output format doesn't require it, my use case and many others won't be compatible with standard validators. keywordLocation, on the other hand, can be derived from the standard "detailed" output if everything has an identifier. It can be left out and there are no use cases that would be incompatible with a standard validator input. That's why it can optional.

Let's please keep the tone of this conversation open.

I'm not sure what you mean by that, but if any of what I've written has come across as rude or aggressive or inappropriate in any way, I apologize. It truly was not my intention. I know I can be suborn, but I only persist when I'm not getting my point across. If we understand each other and still don't agree on a resolution, I'm happy to drop it.

Also keep in mind the months of conversational background that led to the current state. There are requirements that we were dealing with that your use case may not have.

I followed along with the conversation and I have the utmost respect for all the effort everyone put in. At the time, I wasn't in a place to contribute to that particular conversation. I'm possibly the first person to try to do something with the output other than reporting validation results. I bring up my use case not to make this about me, but to show an example of where the current output format is insufficient. I think that's valuable feedback, but if it's not helpful, I'm happy to drop it.

gregsdennis commented 4 years ago

Why is allOf producing any annotations at all? - @karenetheridge

This question isn't saying that allOf would be generating, or even carrying, annotations. It's merely asking whether the passing subschemas of a failed allOf need to be included in the output. In fact, it explicitly would not return any annotations (generated or passed along) because it failed.

I'm asking for an overhaul of the output. I think the structure of the node as well as the tree structure (of each format) of the output should be revisited. Question 1 looks at the node structure, whereas questions 2 & 3 look more at the tree structure.

amosonn commented 4 years ago

keywordLocation, on the other hand, can be derived from the standard "detailed" output if everything has an identifier.

I don't think this is true in the general case. For instance, in case of a recursive schema, just knowing the absolute location (definition site) of the keyword which failed doesn't let you know at which depth of recursion it was reached. This could be perhaps inferred from the tree structure, but this seems much more error-prone (and non-serializable, meaning you can't just pass a node somewhere else in the program without losing this information). In this vein you might just as well argue that the absoluteKeywordLocation can be derived from the other one, if you traverese the schemas and resolve $ref-s accordingly.

I'd definitely leave the keywordLocation intact. Perhaps it could be specified that absoluteKeywordLocation is also mandatory for nodes stemming from schemas which do contain $id?

gregsdennis commented 4 years ago

Perhaps it could be specified that absoluteKeywordLocation is also mandatory for nodes stemming from schemas which do contain $id ?

absoluteKeywordLocation is already mandatory if the path contains a $ref or $recursiveRef.

ssilverman commented 3 years ago

Throwing thoughts in:

There's "terminal" and "non-terminal" validation results. Maybe "terminal" results are the only ones that get into the "basic" output.
"Annotations" and "validation results" aren't the same thing.

gregsdennis commented 3 years ago

I like the nomenclature of "terminal" and "non-terminal".

Your second point is my first in the original post.

gregsdennis commented 3 years ago

In a Slack discussion with @karenetheridge, we figured out that we also need a way to identify when a false schema is the thing causing the validation failure.

Normally, it's keywords that are generating validation errors. So pointers to the error source generally end in keywords, e.g. #/properties/myProp/maximum. But suppose we had something like this:

{
  "properties": {
    "falseProp": false
  }
}

There is no keyword here. The source of the error is the false schema itself. To say that falseProp is the source of the error isn't wrong, but it's not quite right either.

@karenetheridge suggested using a pointer to the property and a message like "subschema is false." It should be fairly clear that it's the false subschema that's generating the error, not the property.

In JsonSchema.Net, I've used a special $false to indicate this, but this was out of a lack of direction more than anything. I wanted to somehow identify that it's not a property named false, so I followed the $xxx format of the standard keywords. To me this felt more proper since it's technically the false subschema that's generating the error, not the applicator that contains it. It's an idea, anyway.

I don't particularly have strong feels on how, but this is a special case that the spec needs to clarify.

jdesrosiers commented 3 years ago

Regarding false schemas ...

In my implementation, validation is implemented as just another keyword. It therefore has a node in the validation output just like any other keyword. The top level node is a "validate" keyword and so is any sub-schema that's involved in the validation.

Therefore, it is a keyword that generates the validation error even when the schema is false. It doesn't matter that there are no keywords in the schema because validating the schema itself is a keyword.

Here's the output from the "falseProp" schema from the previous comment. Notice that the thing that fails is validating the schema but it's errors array is empty because there are no keywords in that schema.

{
  "keyword": "https://json-schema.org/draft/2019-09/schema#validate",
  "absoluteKeywordLocation": "https://json-schema.hyperjump.io/schema1#",
  "instanceLocation": "#",
  "valid": false,
  "errors": [
    {
      "keyword": "https://json-schema.org/draft/2019-09/schema#properties",
      "absoluteKeywordLocation": "https://json-schema.hyperjump.io/schema1#/properties",
      "instanceLocation": "#",
      "valid": false,
      "errors": [
        {
          "keyword": "https://json-schema.org/draft/2019-09/schema#validate",
          "absoluteKeywordLocation": "https://json-schema.hyperjump.io/schema1#/properties/falseProp",
          "instanceLocation": "#/falseProp",
          "valid": false,
          "errors": []
        }
      ]
    }
  ]
}

This approach is also useful with the not keyword where the same keyword can be both true in the context of validation, but false in the context of the keyword. Notice In the following example that the same schema location has two nodes. One node is for the not keyword and the other is for the validate psudo-keyword.

{ "not": false }

{
  "keyword": "https://json-schema.org/draft/2019-09/schema#validate",
  "absoluteKeywordLocation": "https://json-schema.hyperjump.io/schema1#",
  "instanceLocation": "#",
  "valid": true,
  "errors": [
    {
      "keyword": "https://json-schema.org/draft/2019-09/schema#not",
      "absoluteKeywordLocation": "https://json-schema.hyperjump.io/schema1#/not",
      "instanceLocation": "#",
      "valid": true,
      "errors": [
        {
          "keyword": "https://json-schema.org/draft/2019-09/schema#validate",
          "absoluteKeywordLocation": "https://json-schema.hyperjump.io/schema1#/not",
          "instanceLocation": "#",
          "valid": false,
          "errors": []
        }
      ]
    }
  ]
}

gregsdennis commented 3 years ago

@jdesrosiers, I think you and @karenetheridge are following the same concept regarding identifying the false schema: the path leads up to the JSON property that contains the false schema but doesn't contain it.

gregsdennis commented 3 years ago

I think it's worth defining a practice for the true and false schemas.

ssilverman commented 3 years ago

Regarding schemas that are true and false, why do we need an extra level in the location (re. the "$false" mention above)? The parent is enough to tell the schema user where the problem is. For example, in:

{
  "properties": {
    "falseProp": false
  }
}

The location of that schema is (using JSON pointer notation) "/properties/falseProp". Yes, "falseProp" isn't a keyword, but it's certainly the JSON location of that schema. i.e. it's the "schema container name".

gregsdennis commented 3 years ago

@ssilverman I wasn't saying that we do need it. That's what my implementation does to identify that the error is coming from the false schema. More technically, I implement the false as an actual JsonSchema instance that just rejects everything, so it literally is the source of the error. Otherwise, I'd have to check for false in every applicator. It's easier (as the one writing the code, not necessarily the one consuming it) if the error just comes from the false schema itself.

It seems that the consensus is that we merely need to report up to but not including the false.

ssilverman commented 3 years ago

Some more food for thought: Instead of calling the error output locations "keywords", why not just use "schemaLocation" and "absoluteSchemaLocation" instead? This way, we won't get stuck on these locations actually pointing to "keywords". For example, we'd have:

    "schemaLocation": "/properties/falseProp",
    "absoluteSchemaLocation": "scheme://host/example.json#/properties/falseProp"

From:

{
  "properties": {
    "falseProp": false
  }
}

I know I was stuck on restricting my output to only keywords, but in fact, it turns out that's not useful, especially for non-keywords wrapping a schema (in this example; "falseProp" isn't a keyword, yet it's useful to describe the error location).

[Update] Because of the use of the word "keyword", for this case, I would simply say that "properties" failed and my message contained something like, "failed in 'falseProp'", which isn't that helpful if you can't parse error messages and if there's lots of properties.

jdesrosiers commented 3 years ago

I suppose my solution is a middle ground. Every node in the output represents a keyword, but in the case of the pseudo-keyword "validate", the name of the keyword doesn't match the location. "falseProp" is treated like an alias for "validate". So, /properties/falseProp is pointing to a keyword, you just can't tell from looking at the pointer that the keyword it's pointing to is actually the "validate" keyword.

karenetheridge commented 3 years ago

Some more food for thought: Instead of calling the error output locations "keywords", why not just use "schemaLocation" and "absoluteSchemaLocation" instead?

FWIW, I have a 'normalize_evaluation_result' method in my $work application that does exactly that -- the structure returned to the user in error responses uses the properties "data_location", "schema_location" and "absolute_schema_location" :)

gregsdennis commented 3 years ago

I think it's worth summarizing the tweaks we have so far (or at least what I have in my head):

Nested results are reported in a new property, nestedResults (still open to naming). The purpose of this is to have the same property for both passing and failing result sets.
Annotations will be reported in an annotations property (same as current), but it just needs to be an array of values since the locations are provided by the node.
Remove "Keyword" from the locations:
- instanceLocation (@karenetheridge suggested data_location, but "instance" is canonically used within the spec, and I prefer consistency)
- schemaLocation
- absoluteSchemaLocation - only if a $ref-like keyword is present
valid required at each node
error as required by the keyword (the same way that keywords specify their annotations), but wording is implementation-specific (preferably configurable to allow for custom messages or internationalization; mine supports strings with predefined tokens for value inclusion as well)

Also, the algorithms to determine which nodes need to be kept for the basic and detailed formats needs tweaking.

In my implementations, it was easiest to build the full verbose output structure, then pare down using a set of rules. I had tried to summarize those rules in the spec, but I think it's a bit hard to follow outside of the context of reading my code.

basic is essentially a collection of leaf nodes. Intermediate nodes should be removed and the leaves collected into a list nested under the first node. If the nested list consists of a single node, that node is used as the root.
detailed condenses single-node paths to merely the leaf node.
- Intermediate nodes that have multiple child nodes are kept.
- Nodes whose valid value doesn't match their immediate parent node are removed*, e.g. false results of passing any, true results of failing all.

* not is an interesting case for the second reduction rule since it will always have a single node that doesn't match its valid value. If not is valid, then the subschema isn't and it's probably important to know why, meaning we'd want to get the subschema results, which means we don't want to remove its child, even though that violates the second reduction rule. If not is invalid, then its subschema is valid and it doesn't make much sense to include that node, but it doesn't hurt, either. In either case, it seems that the best behavior for not is to always keep its child.

The fact that we have a special case in the behavior of not implies that we need to account for special cases. Perhaps this can be specified by the keyword, like error messages?

gregsdennis commented 3 years ago

It was suggested in Slack by @ssilverman that following the schema structure is not necessarily useful in cases where authoring is complete and the schema is being used to validate instances. He suggests that following the instance structure may be more useful in these cases.

Arguably these cases are more common since authorship ideally occurs once.

The primary benefit of such a structure would be that all errors and annotations that person to a specific instance location would be collected together.

I want to review our original decision to follow the agenda rather than our instance, but I think it was motivated from a point of view of authorship and debugging of the schema itself rather than repeatedly validating varying instances as would happen in a live system and one might expect from an online validator.

handrews commented 3 years ago

In #1065 @jdesrosiers suggests fully identifying keywords in the output format, and I suggest doing so by using the keyword name as a fragment on the URI of the vocabulary. @gregsdennis mentioned that this should get noted here, so... noted!

gregsdennis commented 3 years ago

I'm starting to look into actually updating this section, and one of the first things I'd like to do is work out an instance-based format. This is what I've come up with so far.

For convenience, this is the example in the spec:

// schema

{
  "$id": "https://example.com/polygon",
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "$defs": {
    "point": {
      "type": "object",
      "properties": {
        "x": { "type": "number" },
        "y": { "type": "number" }
      },
      "additionalProperties": false,
      "required": [ "x", "y" ]
    }
  },
  "type": "array",
  "items": { "$ref": "#/$defs/point" },
  "minItems": 3
}

// instance

[
  {
    "x": 2.5,
    "y": 1.3
  },
  {
    "x": 1,
    "z": 6.7
  }
]

Errors are:

The second item is missing y.
The second item has a z but shouldn't.
There should be at least 3 items.

Instance-based output would look something like this:

[
  { "valid": true },
  {
    "x": { "valid": true },
    "y": {
      "keywordLocation": "/items/$ref/required",
      "absoluteKeywordLocation": "https://example.com/polygon#/$defs/point/required",
      "error": "Required property 'y' not found."
    },
    "z": {
      "keywordLocation": "/items/$ref/additionalProperties",
      "absoluteKeywordLocation": "https://example.com/polygon#/$defs/point/additionalProperties",
      "error": "Additional property 'z' found but was invalid."
    }
  },
  {
    "valid": false,
    "keywordLocation": "/minItems",
    "instanceLocation": "",
    "error": "Expected at least 3 items but found 2"
  }
]

So the idea here is that any values will be replaced by the error(s). It's not shown here, but a single location can have multiple errors, and these would be collected into an array, whereas a single error doesn't necessarily need an enclosing array.

Optionally the { "valid": true } could be replaced by the value given in the instance, but having the result object may be terser in some cases, depending on the value. Personally, I like having it as it seems to collapse an entire valid subtree down to this simple construct.

There's also an added item for minItems. There's not really much we can do for arrays aside from just adding an item. Again, I think wrapping all of the errors in a single array and adding this to the "instance" array would be best. This way there's only ever a single item added to the array.

I think a format like this could be quite useful to anything that uses schema to generate or validate input from forms.

Thoughts?

jdesrosiers commented 3 years ago

@gregsdennis I was working on something similar a while ago. I can't remember why, but I found trying to mimic the structure of the instance to be problematic. What I came up with instead was a simple 2D Map from location pointer to keyword to results. Given an instance location, you can easily get all the annotations that apply to that location or all the annotations for a certain keyword that apply to that location. This approach isn't fully vetted either and might have problems as well, but I'm sharing in case it's useful. Here's what your example would look like.

{
  "#": {
    "https://json-schema.org/draft/2020-12/schema#validate": [{ "valid": false }],
    "https://json-schema.org/draft/2020-12/schema#type": [{ "valid": true }],
    "https://json-schema.org/draft/2020-12/schema#items": [{ "valid": false }],
    "https://json-schema.org/draft/2020-12/schema#minItems": [{ "valid": false }]
  },
  "#/0": {
    "https://json-schema.org/draft/2020-12/schema#validate": [{ "valid": true }],
    "https://json-schema.org/draft/2020-12/schema#type": [{ "valid": true }],
    "https://json-schema.org/draft/2020-12/schema#properties": [{ "valid": true }],
    "https://json-schema.org/draft/2020-12/schema#additionalProperties": [{ "valid": true }],
    "https://json-schema.org/draft/2020-12/schema#required": [{ "valid": true }]
  },
  "#/0/x": {
    "https://json-schema.org/draft/2020-12/schema#validate": [{ "valid": true }],
    "https://json-schema.org/draft/2020-12/schema#type": [{ "valid": true }]
  },
  "#/0/y": {
    "https://json-schema.org/draft/2020-12/schema#validate": [{ "valid": true }],
    "https://json-schema.org/draft/2020-12/schema#type": [{ "valid": true }]
  },
  "#/1": {
    "https://json-schema.org/draft/2020-12/schema#validate": [{ "valid": false }],
    "https://json-schema.org/draft/2020-12/schema#type": [{ "valid": true }],
    "https://json-schema.org/draft/2020-12/schema#properties": [{ "valid": true }],
    "https://json-schema.org/draft/2020-12/schema#additionalProperties": [{ "valid": false }],
    "https://json-schema.org/draft/2020-12/schema#required": [{ "valid": false }]
  },
  "#/1/x": {
    "https://json-schema.org/draft/2020-12/schema#validate": [{ "valid": true }],
    "https://json-schema.org/draft/2020-12/schema#type": [{ "valid": true }]
  },
  "#/1/z": {
    "https://json-schema.org/draft/2020-12/schema#validate": [{ "valid": false }]
  }
}

I'm using URIs to identify keywords rather than just the name to distinguish between different drafts. I've used a simple { "valid": true|false } block to make the example easier to read, but these should be full output unit blocks. The leaf is always an array because it is possible for a keyword to apply to one location more than once although that never comes up in this example. I treat validating a schema like an implied keyword called "validate" which is why that appears for every location.

gregsdennis commented 3 years ago

I found trying to mimic the structure of the instance to be problematic

I've found this, too. In mimicking any structure, we have to end up modifying it to inject the error data. It awkward at best.

I like what you have. I think the keywords as URIs is unnecessary, but I understand where you're coming from and I think we need a separate issue for it. I don't see how a single keyword can generate multiple errors for a single instance location, though. And what is the #validate URI?

Also, this seems more of a variation on the current "basic" output, which is a flat list of errors. In conversations with people who wanted an instance-based structure when we were initial considering a formalized output, it was clear they wanted a hierarchical format, just that it followed the instance.

jdesrosiers commented 3 years ago

I don't mean to derail the conversation with my extensions. This was not a proposal. I'm just showing what I've done. Feel free to incorporate as little or as much as you like into your proposal.

I think the keywords as URIs is unnecessary, but I understand where you're coming from and I think we need a separate issue for it.

See #1065.

I don't see how a single keyword can generate multiple errors for a single instance location

"allOf": [{ "title": "foo" }, { "title": "bar" }]. In this case "title": "foo" and "title": "bar" both apply to the same location. Different title keywords, but same location.

And what is the #validate URI?

Not sure if you missed the explanation or it wasn't clear ...

I treat validating a schema like an implied keyword called "validate" which is why that appears for every location.

I think you can safely ignore it. It's really useful for me, but not standard.

gregsdennis commented 3 years ago

Oh, no. At this point, I'm open to discussing how else we can represent errors. What we have is the best we could come up with before, and it's received.... criticism... about its usefulness in certain scenarios. I just want to address that.

Keep the ideas coming.

"allOf": [{ "title": "foo" }, { "title": "bar" }]. In this case "title": "foo" and "title": "bar" both apply to the same location. Different title keywords, but same location.

I see your point, but this illustrates to me that maybe it's not necessary to group by keyword within this format. I'd see these two "title" keywords as separate because they're in different subschemas of the allOf. Since the keyword (and it's specific location within the schema) is part of the error output, I think just a flat list of errors for each instance location is sufficient. Maybe I'm wrong, though, and grouping would be desired.

Not sure if you missed the explanation or it wasn't clear ...

🤦 So this is like a summary for this instance location, then?

jdesrosiers commented 3 years ago

So this is like a summary for this instance location, then?

Not entirely. There will be a result for each schema validated against that location. For example,

{
  "allOf": [
    { "title": "foo" },
    { "title": "bar" }
  ]
}

{
  "#": {
    "https://json-schema.org/draft/2020-12/schema#validate": [{ "valid": true }, { "valid": true }, { "valid": true }],
    "https://json-schema.org/draft/2020-12/schema#allOf": [{ "valid": true }],
    "https://json-schema.org/draft/2020-12/schema#title": [{ "valid": true }, { "valid": true }]
  }
}

This has three #validate results: one for the overall schema and one for each of the allOf schemas that also applied to this location.

gregsdennis commented 3 years ago

I don't understand the purpose of having that. It seems like a lot of repeated information.

jdesrosiers commented 3 years ago

Nothing is repeated, it's just really thorough. Here's a slightly more useful example.

{
  "allOf": [
    { "type": "number" },
    { "maximum": 5 }
  ]
}

{
  "#": {
    "https://json-schema.org/draft/2020-12/schema#validate": [{ "valid": false }, { "valid": true }, { "valid": false }],
    "https://json-schema.org/draft/2020-12/schema#allOf": [{ "valid": false }],
    "https://json-schema.org/draft/2020-12/schema#type": [{ "valid": true }],
    "https://json-schema.org/draft/2020-12/schema#maximum": [{ "valid": false }]
  }
}

#validate tells us that the schema at /allOf/1 failed validation. You could extrapolate that from the #maximum failure, but this allows tooling to skip the expense and complexity of computing that information. Admittedly, this information isn't always useful. For example, I filter out these nodes when presenting errors to users on https://json-schema.hyperjump.io. Humans can trivially extrapolate that information and including it actually increases the cognitive load. But, if I one day I add inline errors to the UI, I expect I would want that data to build the UI.

Here's an example of how I was finding the #validate result useful in some unfinished work I was doing a while ago.

{
  "type": "object",
  "properties": {
    "foo": { "type": "string" }
  }
}

{ "foo": "bar" }

{
  "#": {
    "https://json-schema.org/draft/2020-12/schema#validate": [{ "valid": true }],
    "https://json-schema.org/draft/2020-12/schema#type": [{ "valid": true }],
    "https://json-schema.org/draft/2020-12/schema#properties": [{ "valid": true }]
  },
  "#/foo": {
    "https://json-schema.org/draft/2020-12/schema#validate": [{ "valid": true }],
    "https://json-schema.org/draft/2020-12/schema#type": [{ "valid": true }]
  }
}

We know the instance is valid. If we then modify the "foo" property to "baz" and want to revalidate, it would be nice if we didn't have to revalidate everything. We just want to revalidate what as changed. I can use the results to identify the sub-schema to run validation against the new value ("baz").

gregsdennis commented 3 years ago

Hey everyone. I've been reading through all of the comments in this and linked issues, compiling a list of topics.

It seems that we have consensus on these things:

Change errors to nested/subschemas/inner/details (other suggestions?)
detailed to be specified as MAY instead of SHOULD (like verbose is currently)
Output format should follow that of the draft specified by the root schema
The node for a boolean schema need only specify the location up to that boolean (no need to include, e.g., $false in the location)
valid is required for each node
error is not required but is declared as the designated place to put error messaging. Tooling that consumes this output should expect an error message here.
- wording is implementation-specific (preferably configurable to allow for custom messages or internationalization; token replacement as a possible suggested mechanism?)
Add verbiage to clarify that each keyword only produces a single node (import from #1099)

These things still require decisions (but I've proposed some things):

Annotations need to be rendered differently than they are now. I expand on this in a section below.
Require absoluteKeywordLocation instead of keywordLocation
- Possibility: keywordLocation could be validationPath and optional (default off) to the end user; opens the door for absoluteKeywordLocation to change to schemaLocation (for terseness); implementations SHOULD support an option mechanism which determines the inclusion of validationPath.
Add keyword to support renaming :point_up: (and maybe dialect - import from #1065). Only makes sense when a keyword is present. For instance at the root or #/allOf/3 there is no keyword, so these would be absent.
validationPath and instanceLocation can be plain JSON Pointers and do not need to be URI-encoded
For passing validation, do we include failed nodes (and converse) for basic and detailed formats? My users have said no. (verbose should show everything.)
Change detailed and verbose to *-by-schema and add *-by-instance (or some other way to indicate both structure and quantity of info) to support instance-based structures (see below for examples)
Update ruleset for determining which nodes belong in each format. (I'm also rethinking my implementation which builds verbose during validation and eliminates nodes at the end.)
- basic is a collection of terminal nodes
- detailed-by-schema
  - condenses single-node paths to the terminal node
  - applicators discard nodes which do not match the local result
  - keywords can require specific behavior (e.g. not)
- detailed-by-instance I'll discuss later.

And this isn't directly about output, but supports it:

Implementation should provide a base URI if none is declared in the schema. This ensures the output always has something to use in absoluteSchemaLocation/schemaLocation. This may be in the spec already, but I figured it's worth mentioning.

Annotations

Annotations need to be part of the node, not rendered as child validation nodes. I don't think there's any argument here.

I've been thinking around this for a couple hours now, and it seems we have three orthogonal attributes that we need to consider.

1. Does the keyword provide validation?

Not all keywords provide validation. For example, the spec says nothing about validation for title and description.

My implementation handles this by always passing validation for these keyword, which works to provide the correct pass/fail outcome, but it also means that they produce an output node. So I get something like this as a node:

{
  "valid": true,
  "keywordLocaation": "#/description",
  "schemaLocation": "https://example.com/mySchema#/description",
  "instanceLocation": "",
  "annotation": "description"
}

Question: should pure-annotation keywords produce an output node or simply be listed in parent nodes?

I can see arguments for both sides, and I think I have a pretty easy tweak to my implementation that would hide these, so I'm not invested either way.

2./3. Does the keyword produce and/or collect annotations?

These are actually two different ideas, but I think I often conflate them, so I'm going to consider them distinct here.

A keyword can produce annotations, and it can propagate annotations from its children. Most keywords do one or the other, but some do both.

Keywords like title and description only produce annotations.
Keywords like allOf which only propagate child annotations.
Keywords like properties both produce (which of the listed properties are present) and propagate annotations.

Regardless of how an implementation manages annotations internally, I believe it's worthwhile keeping these distinctions separate when reporting them to the user. It can be useful to know whether a keyword produced an annotation or is just passing a message.

I'd like to propose two more output node properties to cover each of these ideas:

Annotations generated by a keyword are listed as an annotation property. This is just the raw value of the annotation (e.g. a string for title or an array of strings for properties).
Annotations which are collected from child nodes are listed as a collectedAnnotations property. This is an object where the key is the schema location (i.e. the value that would be in schemaLocation) of the keyword that produced it, and the value is the annotation itself.

The downside to listing collected annotations is duplication of annotation values. But on the upside, you can:

trace how annotations are carried through the validation
see at any level what annotations are carried through

Examples for existing output formats

These examples take the spec scenario and show what the output would look like if we changed all of this, including my suggestions on the open discussion topics.

For reference, here's the spec scenario:

// schema
{
  "$id": "https://example.com/polygon",
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "$defs": {
    "point": {
      "type": "object",
      "properties": {
        "x": { "type": "number", "description": "x coordinate" },
        "y": { "type": "number", "description": "y coordinate" }
      },
      "additionalProperties": false,
      "required": [ "x", "y" ]
    }
  },
  "type": "array",
  "items": { "$ref": "#/$defs/point" },
  "minItems": 3
}

// instance
[
  {
    "x": 2.5,
    "y": 1.3
  },
  {
    "x": 1,
    "z": 6.7
  }
]

Flag (no change) :tada:

{ "valid": true }

Basic

Click to expand!

```json { "valid": false, "validationPath": "", "schemaLocation": "https://example.com/polygon#", "instanceLocation": "", "nested": [ { "valid": false, "keyword": "required", "dialect": "https://json-schema.org/draft/2020-12/vocab/validation", "validationPath": "/items/$ref/required", "schemaLocation": "https://example.com/polygon#/$defs/point/required", "instanceLocation": "/1", "error": "Required property 'y' not found." }, { "valid": false, "keyword": "additionalProperties", "dialect": "https://json-schema.org/draft/2020-12/vocab/applicator", "validationPath": "/items/$ref/additionalProperties", "schemaLocation": "https://example.com/polygon#/$defs/point/additionalProperties", "instanceLocation": "/1/z", "error": "Additional property 'z' found but was invalid." }, { "valid": false, "keyword": "minItems", "dialect": "https://json-schema.org/draft/2020-12/vocab/validation", "validationPath": "/minItems", "schemaLocation": "/minItems", "instanceLocation": "", "error": "Expected at least 3 items but found 2" } ] } ```

(I never really liked having a top-level node here, but I don't know how else to collect the nodes. I do like consistently having an object at the root, as opposed to an array.)

Detailed (schema-based)

Click to expand!

```json { "valid": false, "validationPath": "", "schemaLocation": "https://example.com/polygon#", "instanceLocation": "", "nested": [ { "valid": false, "keyword": "$ref", "dialect": "https://json-schema.org/draft/2020-12/vocab/core", "validationPath": "/items/$ref", "schemaLocation": "https://example.com/polygon#/$defs/point", "instanceLocation": "/1", "nested": [ { "valid": false, "keyword": "required", "dialect": "https://json-schema.org/draft/2020-12/vocab/validation", "validationPath": "/items/$ref/required", "schemaLocation": "https://example.com/polygon#/$defs/point/required", "instanceLocation": "/1", "error": "Required property 'y' not found." }, { "valid": false, "keyword": "additionalProperties", "dialect": "https://json-schema.org/draft/2020-12/vocab/applicator", "validationPath": "/items/$ref/additionalProperties", "schemaLocation": "https://example.com/polygon#/$defs/point/additionalProperties", "instanceLocation": "/1/z", "error": "Additional property 'z' found but was invalid." } ] }, { "valid": false, "keyword": "minItems", "dialect": "https://json-schema.org/draft/2020-12/vocab/validation", "validationPath": "/minItems", "schemaLocation": "/minItems", "instanceLocation": "", "error": "Expected at least 3 items but found 2" } ] } ```

Verbose (schema-structured)

Click to expand! (this one's pretty big)

```json { "result": { "valid": false, "validationPath": "", "schemaLocation": "https://example.com/polygon#", "instanceLocation": "", "nested": [ { "valid": true, "keyword": "type", "dialect": "https://json-schema.org/draft/2020-12/vocab/validation", "validationPath": "/type", "schemaLocation": "https://example.com/polygon#/type", "instanceLocation": "" }, { "valid": false, "keyword": "minItems", "dialect": "https://json-schema.org/draft/2020-12/vocab/validation", "validationPath": "/minItems", "schemaLocation": "https://example.com/polygon#/minItems", "instanceLocation": "", "error": "Value has fewer than 3 items" }, { "valid": false, "keyword": "items", "dialect": "https://json-schema.org/draft/2020-12/vocab/applicators", "validationPath": "/items", "schemaLocation": "https://example.com/polygon#/items", "instanceLocation": "", "nested": [ { "valid": true, "keyword": "items", "dialect": "https://json-schema.org/draft/2020-12/vocab/applicator", "validationPath": "/items", "schemaLocation": "https://example.com/polygon#/items", "instanceLocation": "/0", "nested": [ { "valid": true, "keyword": "$ref", "dialect": "https://json-schema.org/draft/2020-12/vocab/core", "validationPath": "/items/$ref", "schemaLocation": "https://example.com/polygon#/items/$ref", "instanceLocation": "/0", "nested": [ { "valid": true, "validationPath": "/items/$ref", "schemaLocation": "https://example.com/polygon#/$defs/point", "instanceLocation": "/0", "nested": [ { "valid": true, "keyword": "type", "dialect": "https://json-schema.org/draft/2020-12/vocab/validation", "validationPath": "/items/$ref/type", "schemaLocation": "https://example.com/polygon#/$defs/point/type", "instanceLocation": "/0" }, { "valid": true, "keyword": "properties", "dialect": "https://json-schema.org/draft/2020-12/vocab/applicator", "validationPath": "/items/$ref/properties", "schemaLocation": "https://example.com/polygon#/$defs/point/properties", "instanceLocation": "/0", "annotation": [ "x", "y" ], "collectedAnnotations": { "https://example.com/polygon#/$defs/point/properties/x/description": "x coordinate", "https://example.com/polygon#/$defs/point/properties/y/description": "y coordinate" }, "nested": [ { "valid": true, "validationPath": "/items/$ref/properties/x", "schemaLocation": "https://example.com/polygon#/$defs/point/properties/x", "instanceLocation": "/0/x", "nested": [ { "valid": true, "keyword": "type", "dialect": "https://json-schema.org/draft/2020-12/vocab/validation", "validationPath": "/items/$ref/properties/x/type", "schemaLocation": "https://example.com/polygon#/$defs/point/properties/x/type", "instanceLocation": "/0/x" }, { "valid": true, "keyword": "description", "dialect": "https://json-schema.org/draft/2020-12/vocab/meta-data", "validationPath": "/items/$ref/properties/x/description", "schemaLocation": "https://example.com/polygon#/$defs/point/properties/x/description", "instanceLocation": "/0/x", "annotation": "x coordinate" } ] }, { "valid": true, "validationPath": "/items/$ref/properties/y", "schemaLocation": "https://example.com/polygon#/$defs/point/properties/y", "instanceLocation": "/0/y", "nested": [ { "valid": true, "keyword": "type", "dialect": "https://json-schema.org/draft/2020-12/vocab/validation", "validationPath": "/items/$ref/properties/y/type", "schemaLocation": "https://example.com/polygon#/$defs/point/properties/y/type", "instanceLocation": "/0/y" }, { "valid": true, "keyword": "description", "dialect": "https://json-schema.org/draft/2020-12/vocab/meta-data", "validationPath": "/items/$ref/properties/y/description", "schemaLocation": "https://example.com/polygon#/$defs/point/properties/y/description", "instanceLocation": "/0/y", "annotation": "y coordinate" } ] } ] }, { "valid": true, "keyword": "additionalProperties", "dialect": "https://json-schema.org/draft/2020-12/vocab/applicator", "validationPath": "/items/$ref/additionalProperties", "schemaLocation": "https://example.com/polygon#/$defs/point/additionalProperties", "instanceLocation": "/0", "annotation": [], "collectedAnnotations": { "https://example.com/polygon#/$defs/point/properties": [ "x", "y" ] } }, { "valid": true, "keyword": "required", "dialect": "https://json-schema.org/draft/2020-12/vocab/validation", "validationPath": "/items/$ref/required", "schemaLocation": "https://example.com/polygon#/$defs/point/required", "instanceLocation": "/0", "collectedAnnotations": { "https://example.com/polygon#/$defs/point/properties": [ "x", "y" ], "https://example.com/polygon#/$defs/point/additionalProperties": [] } } ] } ] } ] }, { "valid": false, "keyword": "items", "dialect": "https://json-schema.org/draft/2020-12/vocab/applicator", "validationPath": "/items", "instanceLocation": "/1", "nested": [ { "valid": false, "keyword": "$ref", "dialect": "https://json-schema.org/draft/2020-12/vocab/core", "validationPath": "/items/$ref", "schemaLocation": "https://example.com/polygon#/items/$ref", "instanceLocation": "/1", "nested": [ { "valid": false, "validationPath": "/items/$ref", "schemaLocation": "https://example.com/polygon#/$defs/point", "instanceLocation": "/1", "nested": [ { "valid": true, "keyword": "type", "dialect": "https://json-schema.org/draft/2020-12/vocab/validation", "validationPath": "/items/$ref/type", "schemaLocation": "https://example.com/polygon#/$defs/point/type", "instanceLocation": "/1" }, { "valid": true, "keyword": "properties", "dialect": "https://json-schema.org/draft/2020-12/vocab/applicator", "validationPath": "/items/$ref/properties", "schemaLocation": "https://example.com/polygon#/$defs/point/properties", "instanceLocation": "/1", "annotation": [ "x" ], "collectedAnnotations": { "https://example.com/polygon#/$defs/point/properties/x/description": "x coordinate" }, "nested": [ { "valid": true, "validationPath": "/items/$ref/properties/x", "schemaLocation": "https://example.com/polygon#/$defs/point/properties/x", "instanceLocation": "/1/x", "nested": [ { "valid": true, "keyword": "type", "dialect": "https://json-schema.org/draft/2020-12/vocab/validation", "validationPath": "/items/$ref/properties/x/type", "schemaLocation": "https://example.com/polygon#/$defs/point/properties/x/type", "instanceLocation": "/1/x" }, { "valid": true, "keyword": "description", "dialect": "https://json-schema.org/draft/2020-12/vocab/meta-data", "validationPath": "/items/$ref/properties/x/description", "schemaLocation": "https://example.com/polygon#/$defs/point/properties/x/description", "instanceLocation": "/1/x", "annotation": "x coordinate" } ] } ] }, { "valid": false, "keyword": "additionalProperties", "dialect": "https://json-schema.org/draft/2020-12/vocab/applicator", "validationPath": "/items/$ref/additionalProperties", "schemaLocation": "https://example.com/polygon#/$defs/point/additionalProperties", "instanceLocation": "/1", "nested": [ { "valid": false, "validationPath": "/items/$ref/additionalProperties/$false", "schemaLocation": "https://example.com/polygon#/$defs/point/additionalProperties/$false", "instanceLocation": "/1/z", "error": "All values fail against the false schema" } ] }, { "valid": false, "keyword": "required", "dialect": "https://json-schema.org/draft/2020-12/vocab/validation", "validationPath": "/items/$ref/required", "schemaLocation": "https://example.com/polygon#/$defs/point/required", "instanceLocation": "/1", "error": "Required properties [y] were not present" } ] } ] } ] } ] } ] } } ```

Note the inclusion of the annotations from the subschemas which passed validation and how they're no longer included once we move up through a subschema that failed validation.

Instance-based structure examples

I think I've been able to marry our two approaches. Considerations:

From the discussions around putting output together originally, it was apparent to me that the request was for an unfiltered, hierarchical structure that mimicked the instance.
I want to keep the "output unit" as consistent as possible between the structures.

I do need to add a results property to contain the validation results that pertain to each location, and nested as an object instead of an array made more sense here.

Verbose (instance-based)

Click to expand!

```json { "valid": false, "validationPath": "", "schemaLocation": "https://example.com/polygon#", "instanceLocation": "", "results": [ { "valid": true, "keyword": "type", "dialect": "https://json-schema.org/draft/2020-12/vocab/validation", "validationPath": "#/type", "schemaLocation": "https://example.com/polygon#/type" }, { "valid": false, "keyword": "minItems", "dialect": "https://json-schema.org/draft/2020-12/vocab/validation", "validationPath": "#/minItems", "error": "Value has fewer than 3 items" }, { "valid": false, "keyword": "items", "dialect": "https://json-schema.org/draft/2020-12/vocab/applicators", "validationPath": "#/items" } ], "nested": { "/0": { "valid": true, "results": [ { "valid": true, "keyword": "items", "dialect": "https://json-schema.org/draft/2020-12/vocab/applicator", "validationPath": "/items", "schemaLocation": "https://example.com/polygon#/items", }, { "valid": true, "keyword": "$ref", "dialect": "https://json-schema.org/draft/2020-12/vocab/core", "validationPath": "/items/$ref", "schemaLocation": "https://example.com/polygon#/items/$ref", }, { "valid": true, "keyword": "type", "dialect": "https://json-schema.org/draft/2020-12/vocab/validation", "validationPath": "/items/$ref/type", "schemaLocation": "https://example.com/polygon#/$defs/point/type", }, { "valid": true, "keyword": "properties", "dialect": "https://json-schema.org/draft/2020-12/vocab/applicator", "validationPath": "/items/$ref/properties", "schemaLocation": "https://example.com/polygon#/$defs/point/properties", "annotation": [ "x", "y" ], "collectedAnnotations": { "https://example.com/polygon#/$defs/point/properties/x/description": "x coordinate", "https://example.com/polygon#/$defs/point/properties/y/description": "y coordinate" } }, { "valid": true, "keyword": "additionalProperties", "dialect": "https://json-schema.org/draft/2020-12/vocab/applicator", "validationPath": "/items/$ref/additionalProperties", "schemaLocation": "https://example.com/polygon#/$defs/point/additionalProperties", "annotation": [], "collectedAnnotations": { "https://example.com/polygon#/$defs/point/properties": [ "x", "y" ] } }, { "valid": true, "keyword": "required", "dialect": "https://json-schema.org/draft/2020-12/vocab/validation", "validationPath": "/items/$ref/required", "schemaLocation": "https://example.com/polygon#/$defs/point/required", "collectedAnnotations": { "https://example.com/polygon#/$defs/point/properties": [ "x", "y" ], "https://example.com/polygon#/$defs/point/additionalProperties": [] } } ], "nested": { "/0/x": { "valid": true, "results": [ { "valid": true, "validationPath": "/items/$ref/properties/x", "schemaLocation": "https://example.com/polygon#/$defs/point/properties/x", }, { "valid": true, "keyword": "type", "dialect": "https://json-schema.org/draft/2020-12/vocab/validation", "validationPath": "/items/$ref/properties/x/type", "schemaLocation": "https://example.com/polygon#/$defs/point/properties/x/type", }, { "valid": true, "keyword": "description", "dialect": "https://json-schema.org/draft/2020-12/vocab/meta-data", "validationPath": "/items/$ref/properties/x/description", "schemaLocation": "https://example.com/polygon#/$defs/point/properties/x/description", "annotation": "x coordinate" } ] }, "/0/y": { "valid": true, "results": [ { "valid": true, "validationPath": "/items/$ref/properties/y", "schemaLocation": "https://example.com/polygon#/$defs/point/properties/y", }, { "valid": true, "keyword": "type", "dialect": "https://json-schema.org/draft/2020-12/vocab/validation", "validationPath": "/items/$ref/properties/y/type", "schemaLocation": "https://example.com/polygon#/$defs/point/properties/y/type", }, { "valid": true, "keyword": "description", "dialect": "https://json-schema.org/draft/2020-12/vocab/meta-data", "validationPath": "/items/$ref/properties/y/description", "schemaLocation": "https://example.com/polygon#/$defs/point/properties/y/description", "annotation": "y coordinate" } ] } } }, "/1": { "valid": false, "results": [ { "valid": false, "keyword": "items", "dialect": "https://json-schema.org/draft/2020-12/vocab/applicator", "validationPath": "/items", }, { "valid": false, "keyword": "$ref", "dialect": "https://json-schema.org/draft/2020-12/vocab/core", "validationPath": "/items/$ref", "schemaLocation": "https://example.com/polygon#/items/$ref", }, { "valid": false, "validationPath": "/items/$ref", "schemaLocation": "https://example.com/polygon#/$defs/point", }, { "valid": true, "keyword": "type", "dialect": "https://json-schema.org/draft/2020-12/vocab/validation", "validationPath": "/items/$ref/type", "schemaLocation": "https://example.com/polygon#/$defs/point/type", }, { "valid": true, "keyword": "properties", "dialect": "https://json-schema.org/draft/2020-12/vocab/applicator", "validationPath": "/items/$ref/properties", "schemaLocation": "https://example.com/polygon#/$defs/point/properties", "annotation": [ "x" ], "collectedAnnotations": { "https://example.com/polygon#/$defs/point/properties/x/description": "x coordinate" } }, { "valid": false, "keyword": "additionalProperties", "dialect": "https://json-schema.org/draft/2020-12/vocab/applicator", "validationPath": "/items/$ref/additionalProperties", "schemaLocation": "https://example.com/polygon#/$defs/point/additionalProperties" }, { "valid": false, "keyword": "required", "dialect": "https://json-schema.org/draft/2020-12/vocab/validation", "validationPath": "/items/$ref/required", "schemaLocation": "https://example.com/polygon#/$defs/point/required", "error": "Required properties [y] were not present" } ], "nested": { "/1/x": { "valid": true, "results": [ { "valid": true, "validationPath": "/items/$ref/properties/x", "schemaLocation": "https://example.com/polygon#/$defs/point/properties/x", }, { "valid": true, "keyword": "type", "dialect": "https://json-schema.org/draft/2020-12/vocab/validation", "validationPath": "/items/$ref/properties/x/type", "schemaLocation": "https://example.com/polygon#/$defs/point/properties/x/type", }, { "valid": true, "keyword": "description", "dialect": "https://json-schema.org/draft/2020-12/vocab/meta-data", "validationPath": "/items/$ref/properties/x/description", "schemaLocation": "https://example.com/polygon#/$defs/point/properties/x/description", "annotation": "x coordinate" } ] }, "/1/z": { "valid": false, "results": [ { "valid": false, "validationPath": "/items/$ref/additionalProperties/$false", "schemaLocation": "https://example.com/polygon#/$defs/point/additionalProperties/$false", "error": "All values fail against the false schema" } ] } } } } } ```

Detailed (instance-based)

Initially I was only considering the verbose format for instance-based, but in building that example, I realized how to pare down the fluff.

The rules I'm following for this are:

Nodes that don't match the parent's validation result are removed. (The thought is that if you have a failed validation, you want to see where it failed, and if you have a successful validation, you don't care where it failed.)
If an output unit's validationPath is contained within another unit's validationPath, it (the first one) can be removed.

This second rule removes pass-through nodes like the kind you'd get having just resolved a $ref and arriving at a new subschema. These nodes are just a summary of what's underneath and don't add much to the result.

Click to expand!

```json { "valid": false, "validationPath": "", "schemaLocation": "https://example.com/polygon#", "instanceLocation": "", "results": [ { "valid": false, "keyword": "minItems", "dialect": "https://json-schema.org/draft/2020-12/vocab/validation", "validationPath": "#/minItems", "error": "Value has fewer than 3 items" } ], "nested": { "/1": { "valid": false, "results": [ { "valid": false, "keyword": "required", "dialect": "https://json-schema.org/draft/2020-12/vocab/validation", "validationPath": "/items/$ref/required", "schemaLocation": "https://example.com/polygon#/$defs/point/required", "error": "Required properties [y] were not present" } ], "nested": { "/1/z": { "valid": false, "results": [ { "valid": false, "keyword": "additionalProperties", "dialect": "https://json-schema.org/draft/2020-12/vocab/applicator", "validationPath": "/items/$ref/additionalProperties", "schemaLocation": "https://example.com/polygon#/$defs/point/additionalProperties/$false", "error": "All values fail against the false schema" } ] } } } } } ```

jdesrosiers commented 3 years ago

@gregsdennis That's a lot! I'm going to avoid pulling on too many threads at once. So, I'll start with the consensus list. Here's a couple notes. For any of the items I don't mention, you can assume I'm in agreement with.

detailed to be specified as MAY instead of SHOULD (like verbose is currently)

I don't know what this means. Can you provide a little more explanation.

error is required as as specified by the keyword (the same way that keywords specify their annotations)

wording is implementation-specific (preferably configurable to allow for custom messages or internationalization; token replacement as a possible suggested mechanism?)

I don't think error should be a required field. Ideally, I'd like to just provide the output results to some third party library like https://github.com/atlassian/better-ajv-errors or https://github.com/apideck-libraries/better-ajv-errors and let them deal with messaging and translation. Given the schema and the instance, all these libraries should need are pointers to the keyword and instance where the error occurred.

gregsdennis commented 3 years ago

detailed to be specified as MAY instead of SHOULD (like verbose is currently)

verbose is currently specified as a "MAY" requirement, but detailed is a "SHOULD". I thought to help support implementers with the change, a little loosening of the spec wasn't a bad thing.

I don't think error should be a required field...

Fair point. I can go with that. I'll update above.

gregsdennis commented 3 years ago

Working branch is draft-next-output.

jdesrosiers commented 3 years ago

Unless I'm missing some contradicting wording somewhere else, detailed is not a SHOULD by itself. Implementing flag or basic covers the requirement as well.

An implementation SHOULD provide at least one of the "flag", "basic", or "detailed" format and MAY provide the "verbose" format. If it provides one or more of the "detailed" or "verbose" formats, it MUST also provide the "flag" format.

gregsdennis commented 3 years ago

🤔 yeah, I see what you're getting at. I'll do some digging to see where we decided that. Do you have an opinion on this requirement?

gregsdennis commented 3 years ago

Looks like the most relevant conversations are:

jdesrosiers commented 3 years ago

I don't have a strong opinion. I'd suggest that implementations MUST support flag and SHOULD support basic or detailed. Anything beyond that is just nice-to-have.

gregsdennis commented 3 years ago

Given that I'm expanding detailed into detailed-schema (what we have currently) and detailed-instance (an instance-based format), would you say that implementations SHOULD provide just one of them? It seems odd to prefer one over another, whether by the spec or the implementation.

I do like requiring flag, though.

jdesrosiers commented 3 years ago

I agree that we probably shouldn't prefer one over the other. So, I'd suggest: MUST support flag and SHOULD support basic, detailed-schema, or detailed-instance. I would personally (not in the spec) encourage implementations to support all of the output formats because each is useful in different situations, but as far as minimum requirements, at least one format with error details should be sufficient.

karenetheridge commented 3 years ago

I haven't had a chance to go over all of this in detail and respond to all the points that I want to respond to -- but I wonder if it might be worthwhile moving to a community discussion so we can have threaded conversations about the various points? Many of them are not directly related, and some require a deep dive to understand motivations and to examine tradeoffs. (We might need an ADR for these, too!)

gregsdennis commented 3 years ago

Sounds good. I'll copy this to a discussion, making edits for the conversation @jdesrosiers and I have had so far. We can create threads for each item.

gregsdennis commented 3 years ago

We don't have GH discussions on in this repo so I opened one in the Community repo with the General topic: https://github.com/json-schema-org/community/discussions/63

gregsdennis commented 2 years ago

I think this is basically resolved at this point. Closing for now.

json-schema-org / json-schema-spec