Reorganizing the meta-schemas

gregsdennis commented 2 years ago

Currently, the keywords are organized based on what "kind" of keyword they are: applicator vs annotation vs assertion (vs "special").

core	applicator	validation	unevaluated	meta-data	format	content
$id	prefixItems	type	unevaluatedItems	title	format	contentEncoding
$schema	items	const	unevaluatedProperties	description		contentMediaType
$ref	contains	enum		default		contentSchema
$anchor	additionalProperties	multipleOf		deprecated
$dynamicRef	properties	maximum		readOnly
$dynamicAnchor	patternProperties	exclusiveMaximum		writeOnly
$vocabulary	dependentSchemas	minimum		examples
$comment	propertyNames	exclusiveMinimum
$defs	if	maxLength
	then	minLength
	else	pattern
	allOf	maxItems
	anyOf	minItems
	oneOf	uniqueItems
	not	maxContains
		minContains
		maxProperties
		minProperties
		required
		dependentRequired

I think it might be easier for schema authors if we organized the keywords by function. This table reorganizes the keywords primarily by what kind of data the keyword addresses. It still has some "special" categories as well.

core	meta-data	combinatorial	array	object	number	string	other/multiple	format
$id	title	if	prefixItems	properties	maximum	maxLength	type	format
$schema	description	then	items	patternProperties	exclusiveMaximum	minLength	const
$ref	default	else	unevaluatedItems	additionalProperties	minimum	pattern	enum
$anchor	deprecated	allOf	maxItems	unevaluatedProperties	exclusiveMinimum	contentEncoding	maxContains
$dynamicRef	readOnly	anyOf	minItems	maxProperties	multipleOf	contentMediaType	minContains
$dynamicAnchor	writeOnly	oneOf	uniqueItems	minProperties		contentSchema	contains
$vocabulary	examples	not		required
$comment				dependentRequired
$defs				dependentSchemas
				propertyNames

format is still on its own so that we can include it with false while leaving the door open to others using it with true.

Aside from that, I think this organization makes more sense from an author's point of view.

jdesrosiers commented 2 years ago

I'm not sure how this makes anything easier. Vocabularies should be organized to make it easy to combine them to make new dialects. For example, if I'm defining a dialect for data definition, I might want applicator keywords, but not validation keywords that don't apply to the ddl domain. The current organization isn't perfect, but I don't see how the proposal is better. I can't imagine how splitting by JSON type would be useful for constructing dialects. Why would I ever want my dialect to support object keywords and not array keywords?

gregsdennis commented 2 years ago

Why would I ever want my dialect to support object keywords and not array keywords?

Maybe you don't need arrays with your data model.

But I'm not asking that question. I'm asking why keywords for arrays exist in multiple vocabs. In particular to this case, what is the use of separating items and prefixItems from minItems and maxItems? If I have an array, I want to be able to define it, and two vocabs makes it a little harder.

Organization by "keyword type" doesn't really seem helpful to anyone other that spec authors.

jdesrosiers commented 2 years ago

what is the use of separating items and prefixItems from minItems and maxItems?

This is exactly the example I gave. If I'm creating a DDL dialect, I only want keywords that I can apply to data-types. minItems and maxItems are validation keywords. They apply to the value, not the type definition. Having an applicator vocabulary makes some sense. It's the keywords that define structure. Any dialect can start with that as the skeleton of their dialect and fill in their keywords to flesh it out. I just can't imagine a use-case where it would makes sense to compose vocabularies based on type.

However, vocabulary organization is arbitrary and no matter what we choose, there will always be use cases where it doesn't make sense. I'd rather define keywords than vocabularies. Then people can combine them however they like without being constrained by the categorization we choose for them.

karenetheridge commented 2 years ago

Aside from that, I think this organization makes more sense from an author's point of view.

This table would certainly be useful to include on the documentation site; there are lots of keywords and categorizing them in different ways can make it easier for a schema author to find what they need.

Perhaps a list of all the keywords, with a column for what vocabulary they belong to (and a link to the spec entry for each), and a column showing what instance type(s) they are applicable for? ..so basically a simplified form of https://docs.google.com/spreadsheets/d/18SIXnzyjXTJZgqeo5W-qIEwq-bNKXb5M76Pq_47r2Is/edit#gid=0

gregsdennis commented 2 years ago

If I'm creating a DDL dialect, I only want keywords that I can apply to data-types... - @jdesrosiers

I don't know that the average schema author is going to be creating dialects, though. That involves writing meta-schemas. Most schema authors are just going to be just taking the base meta-schema.

If I am such a schema author, I don't know what an "applicator" is. I just have an array, and I need to write a schema. To do that, I want to know what keywords I can use that pertain to arrays. As it stands, I have to look in applicator (again, what is that?), validation, and unevaluated to find them all.

With the proposed organization, all I have to do is look at the array meta-schema/vocabulary (and perhaps glance over the other/multiple one) to find keywords that apply.

Having an applicator vocabulary makes some sense. It's the keywords that define structure. - @jdesrosiers

The ones I list under combinatorial don't define structure. These stick out as a "logic" group.

But more to the point here is the example I mentioned earlier. How can you properly define the structure of, say, an array without all of the keywords that pertain to arrays, e.g. both items and maxItems or both contains and minContains? Yet these keyword pairs are currently listed in separate vocabularies.

I think we're making things harder for John & Jane Schema-Author.

This table would certainly be useful to include on the documentation site - @karenetheridge

I see this as a secondary option, but I think there's value in actually reorganizing the vocabularies themselves.

karenetheridge commented 2 years ago

I think there's value in actually reorganizing the vocabularies themselves

I'm not convinced, given:

I don't know that the average schema author is going to be creating dialects

jdesrosiers commented 2 years ago

This table would certainly be useful to include on the documentation site;

When we get around to documenting vocabularies, I agree that something like this would be useful.

I don't know that the average schema author is going to be creating dialects, though. That involves writing meta-schemas. Most schema authors are just going to be just taking the base meta-schema.

I completely agree. Most schema authors won't be creating dialects. But, the only reason for anyone to care about vocabularies is if they are creating dialects. They're otherwise a fairly irrelevant concept to the average schema author. What you are describing sounds like documentation concerns. People are definitely not going to be digging through meta-schemas to see what keywords are available to them. The UJS site is already organized very much the way you've broken things down, so I'm not seeing a major problem here.

But more to the point here is the example I mentioned earlier. How can you properly define the structure of, say, an array without all of the keywords that pertain to arrays, e.g. both items and maxItems or both contains and minContains? Yet these keyword pairs are currently listed in separate vocabularies.

The vocabulary breakdown is definitely not perfect. There are certainly some minor improvements we can make, but no matter what we choose, it will make sense in one circumstance and not in another. This is why I want to move away from keywords being identified by their vocabulary. If keywords are identified independently, everyone can group keywords into vocabularies however works best for them.

I already answered how items and maxItems in different vocabs makes sense. If you didn't like my answer, that's fine, but I don't know what else I can say. For contains and minContains, I agree that it makes no sense to have these in different vocabularies. They are both validation keywords in my opinion.

What would make sense to me would be a vocab for basic structural JSON definition. It would contain everything you need for basic type definition like properties, items, and type. Dialect authors can start with this as a base and flesh it out with their own custom vocabularies. Not included would be logic keywords like anyOf and validation keywords like minLength. So, while the applicator vocabulary comes close to filling this need, it does miss the mark a bit.

gregsdennis commented 2 years ago

People are definitely not going to be digging through meta-schemas to see what keywords are available to them.

For what it's worth, that's what I did when I first picked up JSON Schema.

jdesrosiers commented 2 years ago

For what it's worth, that's what I did when I first picked up JSON Schema.

Point taken. This was too strong a statement. I'll rephrase to, "In my experience, it's very uncommon for people to be digging through meta-schemas to see what keywords are available to them". I'm willing to accept that it's more common than I think. I just meant to express that I've never heard of anyone doing this until now.

gregsdennis commented 2 months ago

I'm moving this conversation to a discussion. Will report back here once decided.

gregsdennis commented 3 weeks ago

This conversation needs to be reframed in the context of not having vocabularies.

I think there is still a benefit to having some grouping of keywords, but it needs to be stated outside of the vocabulary context.

json-schema-org / json-schema-spec

Reorganizing the meta-schemas #1159