json-schema-org / json-schema-vocabularies

Experimental vocabularies under consideration for standardization
56 stars 10 forks source link

"ordered" annotation keyword #21

Open handrews opened 7 years ago

handrews commented 7 years ago

The idea of a way to indicate whether array order is significant or not (basically, is this a list or a set) has been suggested numerous times. We would have use for it in the meta-schema for both the array form of type and presumably for enum, which would reduce the confusion related to json-schema-org/json-schema-spec#474.

"ordered": true would indicate an ordered list, but say nothing about how or why it is ordered "ordered": false would indicate an unordered set If "ordered" is not present, it is not known whether the array is ordered or not, and neither should be assumed. This preserves backwards compatibility, and prevents implementations from improperly handling schemas that are written with less precision.

This would be an annotation, not an assertion. Implementations MAY offer validation algorithms for common ordered cases (ascending/descending numeric order for an array of numbers, for instance), but these MUST NOT be run automatically. The mechanism for turning such algorithms on or off is implementation-dependent (same as for format and the content* keywords).

handrews commented 7 years ago

A bit more details and alternate ideas:

"sorted": true means that the server controls the sorting, and attempts to change the sorting by the client will be ignored. This means that when adding to the array (within JSON data, not making separate POST requests to add collection elements), the client can just append and rely on the server to ensure the correct ordering.

"sorted": false means that the server has no awareness of ordering and will not take any action related to ordering. Such an array may not even have a stable ordering from request to request

"sorted": "/json/pointer/to/field/in/each/array/item" means that the client can control the sorting by changing the value of that field in each item. The pointer root is the root of the item, not the complete instance. An empty pointer "" means that the entire item is the "sort key", so simply re-ordering the array is supported and sufficient. A pointer to a numeric field means that the server will re-order the array based on the value of that field in each item. While there are subtleties with this, it is a fairly common pattern.

I'm not entirely sold on my own JSON Pointer idea. The true value could mean that the server controls the sorting but that the application can document some way for the client to influence it (hand wavey because the point is to give the application flexibility. Anyway, just some thoughts. I don't feel strongly about any of this.

handrews commented 7 years ago

Having thought about this more, I think I'm pretty solidly against the JSON Pointer idea. If you can define a single key, then sooner or later people want to define compound keys, and things get complicated very quickly. I feel like that is better handled at the application layer. The schema should at most indicate that sorting is significant.

Whether the boolean definitions I gave above are ideal or not I'm still not sure. "sorted": true should perhaps just mean that the array is sorted when read from the document, and not that the data authority (server in hypermedia, database in some local storage arrangements, etc.) will automatically re-sort or enforce the sorting. Hmm... there are basically three options for true:

It's worth considering how this would be used in hypermedia- for a collection with complex sorting options, would this add value? How would any of the above options work in such a case? What does "properly sorted" mean if there are multiple ways to sort it? How does one indicate an array that is often sorted but can possibly be requested unsorted (e.g. for performance reasons)? Is that case we care about?

As is often true, this simple concept is more complex to specify than it initially might appear.

gregsdennis commented 6 years ago

I think ordered is better understood that sorted. To me ordered can be any arbitrary but set sequence, while sorted implies there is some logic behind it. As an implementation, I would want to know what that logic is if I am to add to the list.

That said, what rules could there be around modification of arrays?

Given that an array is ordered/sorted, there are two modifications (directly to the array, not necessarily to the items within it) that can be made which maintain that state: addition and removal of items. It seems removal is trivial as it doesn't change the sequence of the existing items.

As of draft-07, we have readOnly to specify whether a client can add items. If we use ordered, then can I add items at both or singly either end? If using sorted it seems that some manner of algorithm specification, which would be remarkably complex, would be required.

gregsdennis commented 6 years ago

Given that JSON natively defines an array as ordered, maybe it would be easier to use "unordered" : true to explicitly specify an unordered set. This would also allow the default to be (more understandably) false.

handrews commented 6 years ago

@gregsdennis I'm not a fan of double negatives (e.g. "this array is not unordered")

awwright commented 6 years ago

Perhaps an enum would be in order:

"ordering": "set" "ordering": "vector" and so on

gregsdennis commented 6 years ago

@handrews while I agree, I tend to err on the side of default values ("false" being default for boolean in the vast majority of languages). While we can state that the absence of the keyword implies a value of true, having that value contradict the default value of the value's type seems confusing.

handrews commented 6 years ago

While we can state that the absence of the keyword implies a value of true, having that value contradict the default value of the value's type seems confusing.

Take the following with a grain of salt, as I'm being even more opinionated than usual and none of it has anything to do with actual spec requirements

I've encountered this view before, but only from people who primarily write in strongly typed languages. It doesn't seem like a problem at all to me- the keyword being absent is represented by undefined (JavaScript), None (Python), etc. In C++ (which I used to work in 15+ years ago) I would probably represent schema keywords with pointers so that undefined keywords can be set to null. This preserves the distinction between being absent and happening to have the default value (although there are other solutions).

Of course none of this is a requirement of the spec. Making a distinction between a keyword being missing and having a value corresponding to JSON null can be tricky, but solvable. I probably would avoid defining a keyword that can be null but defaults to something else. Then again, I try to avoid null in JSON anyway (and no JSON Schema takes null as a value except const, and I'd probably handle it specially rather than allow it to complicate other keywords).

Anyway, I tend to be suspicious of any argument along the lines of "my language/library works better if you do X" in a supposedly language-neutral environment. This may have something to do with how other people have attempted to use this argument in other projects unrelated to JSON Schema :-P

Reminder: the above is my personal opinion, not entirely rational, and not actually required by the JSON Schema spec :-)

dlax commented 6 years ago

Setting "ordered": false to override the default behavior of JSON arrays seems fine to me.

epoberezkin commented 6 years ago

@handrews why "order": true rather than "order": "asc"/"desc"? Unless you want "sorted": true, "sortOrder": "desc"... I don't see much damage with supporting "orderBy": <json-pointer>/<array of json-pointers for compound keys> as well.

epoberezkin commented 6 years ago

@handrews also, is "order": false equivalent to "not": {"order": true}? That is implied by the absence of default value. Because if false means simply the absence of "order: true, then false should be the default. Also, would "order: true mean "anyOf": [{"order": "asc"}, {"order": "desc"}]? Is such construct ever (or at least often) needed by the way?

handrews commented 6 years ago

is "order": false equivalent to "not": {"order": true}?

Since order as a boolean would be an annotation (it tells you that the data is ordered, but does not give enough information to validate it), then it is only collected and propagated upwards if the instance is valid against the schema. So {"not": {..., "order": true}} will never contribute an annotation (either the instance is invalid against the inner schema, in which case order is not collected in the first place, or if it is valid against the inner schema, the not ensures that it is invalid against the outer schema, so all annotations will be dropped at that level.

You can't really negate annotations, which makes sense- they're not a boolean outcome, even if they happen to have boolean values. That add information and the opposite of adding information is omitting information. (This bit probably deserves clarification in the spec).


As for orderBy, that would also have to be considered an annotation, as not all data types have a clear ordering.

Honestly, I think this idea is more trouble than it's worth at this point, at least as a general annotation.

I can see it being a part of something specific like a code generation vocabulary, where it means "use an ordered data type". Or UI generation, where it might indicate that a sorting interface is possible. But I think that those use cases end up being a bit different.

gregsdennis commented 6 years ago

@handrews I think that the root of this issue could be resolved by json-schema-org/json-schema-spec#518, since it was this case that originally prompted my question about ordering arrays.

handrews commented 6 years ago

@gregsdennis Good to know. I still have my doubts about json-schema-org/json-schema-spec#518 but definitely not shooting it down at this point.

I think there's a bit of a "slippery slope" argument to be made here. Normally I hate that argument, but looking at the discussion of ordered so far, the simple form (a boolean) has very limited use. We've had two proposals for different directions already: "asc/desc" and using a pointer for ordered-by. But ordering is often more complicated than easily expressed in such a way.

I feel like this would end up being a not-very-useful feature that would leave people frustrated with its limitations and demanding a much more elaborate system. Which is unlikely to make everyone happy because of the breadth of possible ordering approaches.

So I think this is best left to applications rather than schema.

My current inclination is to close this and, if we find a really compelling proposal, re-open it with that. But I'll leave this issue open at least until folks have a chance to get back from vacation after new year's and catch up.

gregsdennis commented 6 years ago

@handrews

My current inclination is to close this

I agree, but I'm also going to make a note in json-schema-org/json-schema-spec#518 about sequencing the values.

handrews commented 6 years ago

Moving out to draft-future along with the unique key proposal which has some similarities. We won't get to either in draft-08 given the current progress and focus.

handrews commented 4 years ago

Moving this to the vocabularies repo.

gregsdennis commented 5 months ago

I have defined this in the latest release of my Array Extensions Vocabulary. An implementation is available in .Net and you can play with it on https://json-everything.net/json-schema.