pboettch / json-schema-validator

JSON schema validator for JSON for Modern C++
Other
466 stars 134 forks source link

[DRAFT] Provide richer error info for logical combination schemas #67

Closed garethsb closed 1 year ago

garethsb commented 4 years ago

Follow-up to #52.

If a schema includes an "allOf", "anyOf" or "oneOf" near to the root, error messages are almost useless unless they go beyond "one of the subschemas is required to validate" and identify what the failures in each of the subschemas were. Gathering this error info would allow heuristics to be applied to identify the 'best match' (see https://github.com/json-schema-org/json-schema-spec/issues/632), or most nearly matching subschema.

This is a DRAFT pull request to gain feedback as to whether such an approach would fit into this library.

garethsb commented 4 years ago

I suspect it would be a good idea to align error_info with the Output Format included in JSON-Schema draft08.

garethsb commented 4 years ago

I think the draft08 "absoluteKeywordLocation" can be generated from the schema locations and JSON pointers held in the root_schema. However, in the current implementation, during validation I think a schema cannot get these (they are stored in root_schema as std::map keys). I propose to hold these inside the schema objects either as well as, or instead of (and use std::set with custom Comparator for example), in the root_schema.

On the other hand, the "keywordLocation" needs to include $ref explicitly based on how the validation is driven, which I think is best accomplished by adding a schema json_pointer to the validate function arguments.

garethsb commented 4 years ago

Another question to answer before I really get stuck into this is whether all the different 'draft08' validation output levels (flag, basic, detail, verbose) make sense for this implementation and whether the gathering of annotations when the document is valid is also useful or not?

pboettch commented 4 years ago

You are asking good questions here. I haven't looked at draf8 at all. I can't be of much help in this discussion for the moment.

garethsb commented 4 years ago

Thanks for the positive response, @pboettch. I've been diving into the definition of Output Formatting in the latest JSON Schema draft, and it makes a lot of sense.

I think we can end up with an implementation that is able to produce the final validation output in any of the four forms. There will need to be some structural changes to the code that will ensure the output includes all failures in the case of multiple failures and on the other hand, allow "short circuit" validation for performance in all cases where that's possible. Short circuiting has two forms:

On the other hand, the "keywordLocation" needs to include $ref explicitly based on how the validation is driven, which I think is best accomplished by adding a schema json_pointer to the validate function arguments.

I note that the statement in https://github.com/json-schema-org/json-schema-spec/issues/779#issuecomment-520191039 reflects how I was thinking of tweaking the implementation of schema_ref to achieve the above!

(FWIW, I think most of the above will bring much better error reporting without the need to adopt the other changes proposed in draft08 yet.)

lkersting commented 2 years ago

Is there still work to include something like this?

pboettch commented 2 years ago

I haven't checked the newer drafts/standards but back then I saw that error-reporting was defined as well. To make it clearer independently of the chosen programming language.

garethsb commented 2 years ago

I haven't had time to investigate the progress of the spec or bring this to a conclusion, sorry. It's still a good idea! 💡

sam20908 commented 2 years ago

Is there any updates on this? This would be appreciated

pboettch commented 2 years ago

Is there any updates on this? This would be appreciated

Currently not, but if you have time to help, this would equally be appreciated.