Attempt to do “product design” of JSON Schema addressing

By “addressing” I mean anything that we achieve today via IDs and REFs. I, honestly, don’t have a full list of these things that we achieve.

I mean a standard way how product design is usually done, by defining objectives/goals/outcomes and writing user stories for various groups, without any references to UX (which in this context would be $id, $ref etc.)

I will elaborate more on why I believe it is needed, but please reflect on how what we do here is similar to creating a user facing product via mocking up UX, skipping the “product” part entirely. It rarely leads to good results and invariably leads to feature bloat.

Not sure what this label means :)

Problem

The conflict among the maintainers, contributors and implementers has been on-going since before I learned about JSON schema existence. We all seem to have forgotten that JSON schema was created for some purpose and the attempts to shut down and ridicule any dissent to the current spec direction from many maintainers go beyond what is acceptable in professional conversation, both in tone and in spirit. We definitely do not go by any consensus model, as @ucarion suggested.

This attitude of JSON Schema maintainers has already driven away many valuable contributors (e.g. @fge) and I am very close to not caring about the future of the current specification as well.

The current version of the spec is already too difficult to implement and I can make a safe bet that there is not a single JSON Schema validator that fully supports the spec, even excluding optional features. Draft-08 raises the implementation game to the next level.

Even though I am supporting the validator that is more used than any other JS validator, my concerns about implementation complexity are completely ignored.

Possible way forward

I do not think it is a lost cause though. I believe that by doing a "product design" of JSON schema we can align all the interested parties and agree on a simpler version of the spec that achieves 95% of the current value at 50% implementation and cognitive complexity.

By product design I mean that for each group of users (new adopters, users, advanced users, spec maintainers, implementers) we would define the "objectives" that the spec delivers to them (in the language of external outcomes, not in terms of using JSON schema) and the user stories (again as what they want to achieve, without any references to the JSON schema vocabulary, which is equivalent to product UX). This work has never been done and the lack of this clarity leads to the current conflicts and feature bloat.

These scenarios should start from high level and general, such as:

I need a solution that allows to define my data structures in declarative way
I need to sanitise and validate the data that my API receives

to very specific

I need to have a way to structure a re-use my data structure definition
I need to bundle my schema with all referenced schemas into a single file to ship it with my code / serve over HTTP

Once we agree on the objectives and scenarios we may need, we also need to assign a value of each scenario for the community, based on some agreed criteria - including the number of groups that need this scenario and the benefits etc.

Once we understand the value of all required scenarios, we can consider various options how this scenarios can be achieved - including the current solution that exist in the spec, solution that is proposed in draft-08 and any alternatives. For each solution we need some implementation complexity assessment, but it should be only done by the core maintainers of widely used validators - theoretic considerations of implementation complexity are not helpful.

The rest is simple - we define the priority of each scenario as ratio of benefit to implementation complexity, and if this ratio is lower than any agreed threshold the feature should be not included in the spec, whether the feature is old or new.

As I wrote, I believe that this product analysis of JSON schema spec will lead to a radical simplification of the spec and will retain 95% of the value at half of the implementation cost, will make implementations more consistent and eventually lead to JSON schema becoming RFC - a win for everybody.

Alternative ways forward

The spec will split in 2 - the current one and a simpler one, similar to what @ucarion proposed and/or to the spec in OpenAPI, but essentially subset of the current spec plus some additions that would simplify adoption (that we are not even discussing here). It can be called "Simple JSON Schema".
There will be another incompatible spec - there are ongoing attempts to define an alternative to JSON schema using completely different semantics (e.g. https://github.com/standardschema/javascript), some of them will eventually succeed in replacing JSON Schema if there is no simplification here. These attempts are the symptom of wide dissatisfaction of early adopters with the current spec.

First, this seems somewhat ambitious. Sometimes that's warranted! But right now I know many of us are fatigued.

Consensus is indeed an art, I've tried very hard to make sure we're consensus-driven in the IETF style (as opposed to other styles like at ISO, ECMA, or similar alternatives). It's caused a large amount of strife at times but remarkably we've largely come around to the same opinions by talking through things.

However, we're primarily a forum for implementations, and a key thing to keep in mind is mostly we need consensus to change things: We want some certainty that when we make a change, we know it's an improvement.

In the typical process of standards development, I'd say you're looking for a list of use cases. Sometimes specifications formally publish a document of use cases, sometimes it's just the examples, but they're always important.

The use cases all follow from some sort of charter or purpose. The purpose of JSON Schema is to make assertions about JSON documents. That's it. This makes JSON Schema useful for so many things: validating user input, publishing expectations that servers have of clients, and autogenerating documentation (among other things). These all come from JSON Schema's ability to make assertions about JSON documents. And if it doesn't have to do with that, then we say good luck, but that's out-of-scope.

Now there's an implicit part of this, that we're publishing an Internet standard, so it has to fit into the Internet and Web architecture. This means things like:

If we want to download something, we give it a URI
If we want to express a relationship between two things, we give it a link.
If we want to let people publish documents of these assertions, we have to define a media type and follow the rules for forward & reverse compatibility.

And so on.

These don't advance the purpose per se, but we have different routes to solve the problem, and it's just easier and more accessible for users if we adopt the Internet ecosystem. e.g. we could invent our own form of identifier, but if we (carefully) use the Internet's preferred identifier (the URI), JSON Schema is suddenly more useful as an Internet standard.

So, if we can stay within this process, and if you can trust the IETF process, then I'd say let's try to come up with a list of use cases for JSON Schema. The wiki is a pretty good place to publish research like this, let's make a page there. List a use-case, a few tests (positive and negative), and comparisons to other technologies (my favorite is HTML).

That said, I'm skeptical there's as much of an implementation problem as you seem to describe.

First, it's our job as implementors to take on complexity on behalf of users. This is the Priority of Constituencies. This is not to dismiss all concerns; if there's a good argument that a feature is causing performance issues for users, that's something to consider.

Second, if a feature is making life difficult for schema authors, then sure let's work on that. But I'm not sure it gets much easier than "use {$id:"#name"} to define a named schema and {$ref:"#name"} to include it". (But in this case, we need to first understand where these users are coming from so we can match their expectations.)

Third, I don't even think it's that difficult to implement. I've written two validators that work as described in the recent drafts. This process only requires I index the names (the "$id"), a process that's implemented in about two dozen lines (you have to do the same thing for any other media type); and supporting property paths in the fragment, which involves dereferencing the base part of the URI (the part before the fragment), then descending into the named properties/items (the same way your Web browser submits HTTP requests with the fragment removed). This is all pretty straightforward, I don't see very many moving parts here.

There's lots of arguments to be had about theoretical purity ("BUT what if you have an $id naming a property path!!!!"), and let's consider those, but at the end of the say that's our least important concern.

Finally, sometimes there's just multiple correct ways of doing things, and not everyone is going to try the same thing first. If we've picked the best alternative and given people ways to recover from making the wrong first guess, that's going to be the best we can do.

So in short, we're kind of burned out, but I was actually looking at starting a Wiki page documenting how references are used "in the wild"; if you want to spearhead that & a list of use-cases (maybe including things people wanted to do but couldn't figure out), let's go for it.

@awwright thanks for the reply

First, this seems somewhat ambitious. Sometimes that's warranted! But right now I know many of us are fatigued.

There are two areas where there is a consensus not reached: addressing (that's what this issue is about) and schema re-use - extending properties. So the suggestion was to address this bit by bit, not all at once. Agreed on "assertion" being high level scope.

First, it's our job as implementors to take on complexity on behalf of users "users over authors over implementors over specifiers over theoretical purity"

I agree with that, I believe that in our case JSON Schema users are authors, in most cases. For users/authors to benefit, the main value of JSON Schema - consistent assertions across platforms - should be seen as a higher priority than flexibility and feature set. As an example, I don't see how users can benefit from unevaluatedProperties keyword, even if it can solve a real problem, if it is not consistently supported across all/most platforms. For users to benefit from the new features, you need a commitment from the core maintainers of validators used on various platforms (at least one per platform) to support a feature within certain amount of time. If there is no such commitment, introducing such feature to the spec would not benefit users, quite the opposite. So "implementors over specifiers" principle is not always followed here.

Second, if a feature is making life difficult for schema authors, then sure let's work on that.

Without implementations consistently supporting a feature across all platforms, authors' life is definitely going to be difficult. I can see a lot of attention to the ease and flexibility of writing schemas, and not sufficient attention to implementation complexity that it may cause.

Third, I don't even think it's that difficult to implement.

In 2015 I thought Ajv will be a project for several weekends. And then I spent 2 years fixing various scenarios reported by users in $ref area. I previously suggested to run any validator against these tests, they are in the same format as in the test suite: https://github.com/epoberezkin/ajv/tree/master/spec/tests (they are using these remotes: https://github.com/epoberezkin/ajv/tree/master/spec/remotes). When I ran other JS validators against Ajv test suite - none of them was passing $ref tests (https://github.com/epoberezkin/test-validators). The whole motivation for creating Ajv was that I could not find a single validator that consistently supported $ref (and I tested 11 JS validators against my schemas - neither was complying with the spec).

So while I agree that implementing the spec to support most common scenarios is relatively straightforward, implementing it to support all scenarios for combinations of recursion with base URI change becomes very difficult. I am happy to talk further once you've tested your validators and can confirm they pass all these tests and, if they don't, whether it was easy to fix.

So, if we can stay within this process, and if you can trust the IETF process, then I'd say let's try to come up with a list of use cases for JSON Schema. The wiki is a pretty good place to publish research like this, let's make a page there. List a use-case, a few tests (positive and negative), and comparisons to other technologies (my favorite is HTML).

That would indeed help.

From @erosb in #727 (consolidating here):

I though it may make sense to give some feedback from an implementor's point of view about the problems raised and effort.

First, I also have the feeling that json schema specification doesn't have accurate-enough goals. Many keywords seem to be quite ad-hoc and unjustified (like "dependencies"), while other things (like proper inheritance support) is just missing. As a consequence, currently the specification doesn't have formally well-defined goals, while the informally targeted goals are:

validation: due to the high number of features, schema authors can specify so many restrictions that no other type systems can express. From a given point it is good.
documentation generation: sort of, not a very widespread usecase, definitely no blocking problem with implementing it
code generation: this is just in very bad shape. While the "feature bloat" is useful for validation, it is harmful for code generation. There is just no type system (in any widespread languages) which is able to express things like "not" or "contains".

So having formally defined goals & usecases, and keeping these in mind while accepting or rejecting a specification change would be beneficial.

Complexity-wise: up to draft-8 I could survive, now the library I maintain supports 3 draft versions (4, 6, 7) and I didn't even have to make breaking API changes. That's good. The most painful point was/is understanding, using and implementing "$id" and "$ref" , this is a high-frequency topic of bugreports & help requests are related to this area. There a few other little quirks, but overall I don't think the spec needs enormous efforts to implement.

Simplifying the specification has multiple effects:

it helps new implementations to arise. Fine.
on the other hand, exising implementations' codebase doesn't get simpler, because they still have to support previous draft versions
what does simplify implementations is - IMHO - reducing the changes to the specification. This might sound ridiculous, but in reality it makes sense to reduce the draft release frequency and/or limit the added fetaures to things that are challenged and proved to be necessary. This actually helps keeping the specification and implementation support in sync. Considering that most json schema implementations are open-source projects not really backed by any corporates, it makes sense for me to avoid unnecessary specification changes, to avoid overwhelming implementors.

@erosb I'm not sure I agree with 100% of what you said, though I likely agree with 99% of it, but "let's be more careful with backwards incompatible changes", if I oversimplify the parts I really strongly agree with, is way different from anyone else's points I've seen on these long drama posts.

That one I definitely agree with, if it's the thrust of your comment.

@erosb @Julian in the forthcoming draft, we made a point to make the two potentially incompatible changes (definitions -> $defs and dependencies -> dependentSchemas + dependentRequired) technically compatible by reserving the old keywords for their existing behavior.

New implementations do not need to actively support the old keywords, and old implementations can continue to support them even in newer drafts. So you can't rely on interoperability between old and new, but if you were using the old keywords and you know that your implementation continues to support them, you can rely on that. And more importantly, you don't suddenly get different behavior from a keyword that looks the same.

definitions/$defs is almost a no-op anyway, except for recognizing that there are subschemas there, particularly if you have nesting and therefore may need to recognize $id.

dependencies basically results in two new keywords mapping to existing behavior in a simpler way than the existing keyword. The new keywords will be more straightforward for new implementations, and hopefully less burdensome for existing ones than, say, the exclusive* changes a few drafts back.

...the attempts to shut down and ridicule any dissent to the current spec direction from many maintainers go beyond what is acceptable in professional conversation, both in tone and in spirit. We definitely do not go by any consensus model, as @ucarion suggested.

This attitude of JSON Schema maintainers has already driven away many valuable contributors (e.g. @fge) and I am very close to not caring about the future of the current specification as well.

@epoberezkin

I'm deeply sorry if you feel I've contributed to the above. For the avoidance of doubt, everyone is welcome, and no one should be rediculed.

In terms of working principals, I want to present a principal I work with: burden of proof.

For the maintainers of JSON Schema, we are activly engaged with the community, on slack and elsewhere. Replying to questions, helping people, monitoring StackOverflow. We see the community at work. I try to make sure that what we add or change or remove from the spec, reflects a real need. Editing the spec (and when reviewing), I work based on my experience of the community. I do not, and cannot, evidence every decision on reason for making a change, because it's done on spare time. If I, or others, had to do that, the spec would fall back into stagnentation, and would be left for someone else to pick up.

I feel the developments since draft-4 are evidence that this model has worked, as we look for general consensus.

Issues are a platform for open discussion on changes, plus the 1 month manditory review and feedback period before a new draft is published.

When someone from the community presents a requirement directly, unless we have experienced and seen otherwise, it looks like an n = 1 issue. If I have not seen a requirement from the community for a need, I feel it's reasonable to request evidence from an issue author to support their request.

Burden of proof not for editors, but for new suggestions.

To me, this is how I understand the consensus model in practice, given that all editors working on this are not being paid to do so. It's a community effort, supported by individuals giving up their own time to make it a priority.

I can see a situation where if a group of implementers get together, and all say "this is too complex", and another group of implementers don't jump up to refute such, then we may have re-evaluate changes made in draft-8.

Although I'm the first to argue that although the spec is a draft, it has people who use in in production and we should treat it as such, BUT, we know there's a lag between publication and support (as you stated).

If during the 1 month review process, lots of implementers say this is not an OK change, then we have to listen to that. Currently I'm hearing comments from both ends of the spectrum.

One of the reasons for allowing the 1 month review process is to try and make decisions based on working code. (iirc that was an IETF principal.)

@epoberezkin Dispite all of the above, I think there's still plenty of legitimate discussion to be had here.

Maybe now is the right time to do a scoping excercise.

I think it's helpful to look at @ucarion's closing comments for #710...

...

The intention of this issue was to discuss whether JSON Schema should make IETF standardization its prime directive, and focus on simplification as the instrumental means of achieving that end.

JSON Schema remains ultimately a project on the basis of rough consensus. And there does not today exist many people on this project with enthusiasm for wrestling with standards bodies.

Nor is it evident that JSON Schema can or ought to dramatically cut scope. Though there are many people who could live with just a small subset of JSON Schema that the project has long supported, there are also many people who want everything that's in the spec present, imminent, and future.

Therefore, JSON Schema shall not change its focus. The current trajectory -- of making a sophisticated, generalizable, extensible system for validating and annotating JSON-like data -- shall remain the course.

I have no doubt this approach will work, but it will take time -- when you do more, there's more to get right. There perhaps exists room for a far more modest variant of JSON Schema, more aligned with the aims I've proposed in this ticket. ...

https://github.com/json-schema-org/json-schema-spec/issues/710#issuecomment-467739283

I don't think we are ready to make the path to IETF the prime directive (props for the reference, if intended). That being said, I (and others) should validate that, and review issues already assigned to milestones. I feel focusing on simplification as the instrumental means isn't the right approach. Simplification is possible, and I could see how it would work, but there are already common use cases that would require elements I would place in the "non simplified version" should such a split exist.
I think this assertion is right. I've spent a lot of time and energy working on this aspect (probably more than others in the admin team). I've had meetings with various people, from the IETF, W3C, JS Foundation... I figured it was good to at least open the communication, but that JSON Schema still had a few glaring issues, so we shouldn't prioritise it. I still feel this is the case.

Maybe we could settle on making path to rfc a stronger consideration when evaluating what goes into draft-9 and 10. I think this is open for discussion, and should reflect feedback on draft-8.

I'm feeling it's possible there is scope for having a split of validation, but it also makes me uneasy. I'm pretty sure there's a much longer discussion to be had here, and I'm comitted to making sure we HAVE that discussion before we set to work on draft-9.
I feel this would be my personal preference, but it may not be the consensus we reach, if we truly evaluate the needs of schema authors and implementers. As editors, we have gone out of our way to make sure that changes are not solely based on personal preference, and even making changes that go against our preference, when there's a clearer consensus or middle ground that can be reached.

I hope you feel this is a fair and measured response, and that we can continue to move forward in a collaborative spirit. I certinly don't want anyone to feel alianted or unwelcome.

I think @Relequestual addressed this thoroughly, and it's sat open for 8 months now without further engagement. Closing.

json-schema-org / json-schema-spec