Open Julian opened 6 years ago
Do you mean something like a severity index for reported errors? Is that something that can relate to #270?
It's only tangentially related. That ticket it looks like is about being able to explicitly define an error level. I.e., it's for a new spec feature. This one isn't, it's about whether there is an existing well-defined heuristic we can agree on to help answer "imagine you are only allowed to show one and only one validation error. Which one should you show?"
@Julian I really like this idea. In one possible example I gave in #396 I tried a little bit to organize the errors, but "error with oneOf
" is the simplest error to report but not really the most useful. That's a hard one, but I like your simpler example of type vs restrictions within type.
It would need to account for custom keywords somehow (which is more a question for #602 and the whole custom keyword registration requirements, so we needn't get into it here).
It would need to account for custom keywords somehow (which is more a question for #602 and the whole custom keyword registration requirements, so we needn't get into it here).
Yeah -- so, my implementation only defines a partial ordering, where anything that hasn't been explicitly deemed "low importance" or "high importance" essentially gets arbitrarily ordered, but certainly looks like it'd be useful to allow custom keywords to hint at where they belong (which maybe goes back to @Anthropic's point).
@Julian yeah I read your implementation and felt like a severity index covered that, but understood from your response that you wanted to go beyond that with this issue. I think having an error index which can then correspond to a severity index could be of real benefit for consistency of error prioritisation across implementation. It would need to factor in if the basic error is within an *Of as @handrews mentioned which goes back to the defined algorithm I thought you were suggesting, is that about right-ish?
I would venture to guess this is covered by several other issues. Maybe #31 ?
First, keywords are designed to be non-overlapping. If you have a schema that describes how a string is supposed to be formatted, but you provide a number, that number doesn't trip any of the assertions except "type".
Really the only case where there's a problem is with schemas that define multiple disjoint & nonadjacent sets of valid values, like oneOf can do.
I visualize the problem like this: Imagine we can plot valid values on a 2D axis (imagine, say, JSON Schema only describes numbers). The vast majority of schemas describe a valid region that's just a circle. If you're outside the circle region, you know which direction to travel to make the instance valid again.
But sometimes, you have a complex schema. Maybe it's a checkerboard pattern, or two circles, each on opposite sides, and far away from, the Y-axis. Which one did the user intend to target?
Does this sound on track?
@awwright this is a more general issue than #31 (I'd actually much rather close out #31 as it kind of wandered off into a proposal for select
which I don't think is necessary or even viable).
I'm not sure I follow your analogy. Yes, a lot of schemas have straightforward error reporting, but the difficult ones are of interest here. And also optimizing the error experience, which people have complained about a lot across multiple implementations, including in the IETF JSON working group.
While I can sort of see why you're talking about overlapping keywords and type
, I think @Julian's examples can be tweaked to avoid that problem.
So, following up on #396, one thing I'd also love to see discussed or addressed is what my implementation calls
best_match
.The purpose of this function is to heuristically answer the question "given a schema and instance with multiple issues, what is the most fundamentally wrong with the instance?".
An example to illustrate:
So if you imagine an instance with both issues, we'd select the first error over the second one.
So, question: how well defined is this notion, and can we standardize an algorithm here?
Here's my implementation (which by the way is not promised to return the same values over time, in case I come up with a better implementation, but that hasn't happened yet):
https://github.com/Julian/jsonschema/blob/master/jsonschema/exceptions.py#L274-L278
You'll find that this touches quickly on a notion of "descent" into
anyOf
/oneOf
/allOf
sub-errors, which I find pretty interesting to think about too, but I'll refrain from elaborating on that until someone else confirms this is interesting to discuss.