ietf-wg-httpapi / mediatypes

Other
5 stars 4 forks source link

Fix: #32. Reference OAS and jsonschema as spec. #43

Open ioggstream opened 2 years ago

ioggstream commented 2 years ago

This PR

ioggstream commented 2 years ago

@awwright is this better?

@jdesrosiers currently, IANA seems to be fine even with specs that are not RFCs.

We have all the time to make it perfect, the important for now is to improve it.

awwright commented 2 years ago

The effect of not referencing JSON Schema directly is that user-agents that encountering "application/schema+json", and a validator called from code, will handle the same schema differently. I don't think this is what we intend to do.

If referencing JSON Schema would (as of now) under-specify certain behavior, then that's a defect in JSON Schema that we can resolve, but shouldn't be patched around in the media type registration.

jdesrosiers commented 2 years ago

The effect of not referencing JSON Schema directly is that user-agents that encountering "application/schema+json", and a validator called from code, will handle the same schema differently.

Why? There's clearly some tacit assumption you're making that I'm not following. This document defines how to determine the dialect and delegates specific semantics to the specification that describes that dialect. There's always a dialect and thus always a specification that needs to be followed no matter what context you're in. The result is the same except that this supports any number of dialects while directly referencing a single dialect specification pins the media type to one dialect.

awwright commented 2 years ago

Why? There's clearly some tacit assumption you're making that I'm not following. This document defines how to determine the dialect and delegates specific semantics to the specification that describes that dialect.

It's the plain effect of what's written... If you're implementing JSON Schema as a part of the Internet ecosystem, and you download an "application/schema+json" resource, your implementation follows the IANA media types table to determine how to interpret that response, and now there's a set of rules you'll be following that's different from following only JSON Schema Core, as validators do.

Likewise, if I'm writing a validator, what motivation do I have to implement the rules specified here? After all, I'm not implementing the media type rules.

This may be surprising because specifications aren't normally written like this, typically there's no distinction between an "application/json" document vs. one passed directly to JSON.parse.


There's an easy way to fix this though, let's write it into the specification.

We historically avoided this because we generally treated backwards compatibility as the responsibility of implementations. If we want to come up with a standard behavior, we can easily write it into JSON Schema Core.

handrews commented 2 years ago

Hi folks- I have a lot of thoughts on this, but I've been holding off as I've been away from the project and am coordinating with @Relequestual to make sure I'm not working cross-purposes. Scheduling has been a challenge and it will be another week before we sort out how I might best participate. For now I'll just note a few high-level things.

  1. A stable base specification for JSON Schema could consist of:
    1. Media type registration (being addressed here)
    2. The core vocabulary (which is considerably smaller than the core specification, and with @jdesrosiers 's $dynamicAnchor/$dynamicRef proposal, is looking pretty solid)
    3. A clear set of boundaries on what keywords can and can't do (the spec somewhat describes this, but not with the degree of clarity, completeness, or conciseness required)
  2. This base specification is all that matters for versions and compatibility, so finalizing it would solve a lot of problems (I'm not saying it's trivial to finalize, but it's not as hard as finalizing everything)
    1. Versions are not dialects, at least for the core vocabulary. It has been somewhat convenient to think of them that way, and $schema somewhat confusingly identifies both, but versions can have fundamentally different processing rules and bootstrapping steps, while dialects cannot.
    2. With a stable base specification, versioning in the sense of "how do I even process JSON Schema" would no longer be a problem.
    3. Versions of the other vocabularies can be treated as dialects, because changing out your applicators or validation assertions or annotations does not fundamentally change how you process JSON Schema.

I don't know exactly what should go where, but if we keep thinking of "JSON Schema" as the complete current two specifications, which specify 7 or 8 vocabularies between them, that's enormous and difficult to wrestle with. But all but one of those vocabularies become a lot easier to manage if the core is stabilized.

Dialects are nothing more than collections of vocabularies, and only the core vocabulary is important for bootstrapping.

jdesrosiers commented 2 years ago

now there's a set of rules you'll be following that's different from following only JSON Schema Core

What do you think is different? I specifically wrote this to be compatible with all existing dialects of JSON Schema. It says the same thing as JSON Schema Core does. The only thing that's left out is the "profile" media type parameter from draft-04.

There's an easy way to fix this though, let's write it into the specification.

Write what into the spec?

awwright commented 2 years ago

Write what into the spec?

Primarily, any normative language (especially MUST):

Clients MUST use the following order of precedence

The $schema keyword ... MUST be a URI

The schema media type parameter MUST be a URI-reference

All of these are already specified, or could be added to Core.

The media type registration need not have any complexity... Our only need for this is to officially designate "when you see 'application/schema+json', refer to JSON Schema Core".

jdesrosiers commented 2 years ago

@handrews Thanks for your feedback.

Versions are not dialects

We've been using the term "dialect" to refer to older and custom versions of JSON Schema for a while now. I understand that that's not what you had intended. We can discuss internally if we want to change the vocabulary we use discuss this topic, but for clarity, when I used the term in this document, I am referring primarily to JSON Schema versions (official and custom).


I think what you're suggesting is that the media type should define a stable vocabulary system and core vocabulary. That makes perfect sense in an ideal world, but I think we have a constraint that makes that goal impractical. The application/schema+json media type is used in the wild in production environments and could be following any one of the many JSON Schema versions that have been released over the last decade. Registering this media type should not result in existing implementations being in violation of the media type definition.

We would have to define the base specification to be backwards compatible with all previous releases of JSON Schema which I don't even think is possible in all cases. We would also have to commit to no more backwards incompatible changes. In order to be inclusive of older OpenAPI and MongoDB JSON Schema flavors, we would have to do things like allow dialects to ignore many of the core keywords. Making everything optional makes the specification pretty weak.

I think the core vocabulary is close to being stable, but the vocabulary system is far from stable. Even when we think we have something that is a candidate for stable release, we'll have to wait a couple years for it to get adopted and used enough to determine if we got it right. In the meantime, the widely used application/schema+json media type remains unregistered.

That's why I wrote this document up the way I did. It allows us to be flexible and strict at the same time. It allows us to register media types that have been in use for years without us having to commit to never/rarely changing certain things until we are ready. Ideally, I'd like to go the base specification route, but I think we have too much baggage for that to be practical at this point.

jdesrosiers commented 2 years ago

@awwright

The media type registration need not have any complexity... Our only need for this is to officially designate "when you see 'application/schema+json', refer to JSON Schema Core".

The problem is that there are many versions of JSON Schema Core and there are many implementations that follow different versions. By moving this language into the media type definition, it has one authoritative and stable place where it's defined from now on. Future releases don't have to continue to duplicate this work. The idea is that this language will be removed from JSON Schema Core in future releases and replaced by a reference to this document.

awwright commented 2 years ago

there are many versions of JSON Schema Core

I know we sometimes say there's "versions" of JSON Schema, but in this context that may be misleading: There's been many publications of JSON Schema over time, but newer publications replace older ones in their entirety (this is specified in the first few paragraphs). Newer releases are supposed to be compatible with schemas for older validators.

We've generally left reverse compatibility up to implementations, and there's a couple options at their disposal (defining the older keywords, and using $schema as a heuristic), but of course this isn't cross-platform.

If despite this, there's a need to change behaviors based on the "version" that a schema was written for, then let's specifically list those behaviors, how to detect which one to apply, and let's make a plan to write them into the specification.

handrews commented 2 years ago

@jdesrosiers

We've been using the term "dialect" to refer to older and custom versions of JSON Schema for a while now. I understand that that's not what you had intended.

I apologize, I did not mean to say that in a way that implied you were incorrect in that usage. In the past I was at best ambiguous about it, and for me to claim that I didn't mean for "dialect" to encompass versions would most likely be self-serving after-the-fact editing 😅

The point I wanted to make was that there are two different things encompassed by the current term "dialect". I think that's a point we can acknowledge without requiring the established usage of "dialect" to suddenly be wrong. So if we need to discuss this distinction further, let's go with "version dialects" and "non version dialects", and let "dialect" continue to encompass both.

I also want to emphasize that I do not think we need to wait to finalize a base specification to register the media type. I apologize for not making my intent more clear. TBH I got this PR a bit mixed up with issue #20 which has more context in it and only realized my comment was a bit of a non-sequitur a few days later. I'm going to continue the broader discussion over there as that feels more appopriate.


@awwright commented with:

I know we sometimes say there's "versions" of JSON Schema, but in this context that may be misleading: There's been many publications of JSON Schema over time, but newer publications replace older ones in their entirety (this is specified in the first few paragraphs).

This is pretty much how I've been thinking about it.

Newer releases are supposed to be compatible with schemas for older validators.

I'm a little unclear on exactly what this requires (and whether we've just been ignoring it or what).

We've generally left reverse compatibility up to implementations, and there's a couple options at their disposal (defining the older keywords, and using $schema as a heuristic), but of course this isn't cross-platform.

If despite this, there's a need to change behaviors based on the "version" that a schema was written for, then let's specifically list those behaviors, how to detect which one to apply, and let's make a plan to write them into the specification.

I think this is a bit closer to what I was trying to get at with the base specification stuff, which is the idea that we might be able to stabilize enough (not my entire list!) to handle this inside of JSON Schema without excessive forward or backward compatibility constraints, and without just making everything optional. @jdesrosiers I agree that those are not viable options.

awwright commented 2 years ago

I'm a little unclear on exactly what this requires (and whether we've just been ignoring it or what).

On our end, it means we can't redefine behavior that was specified in an earlier draft. We only un-define behavior, and let implementations continue to work the way they always have, if so desired.

This was OK to me, because why would we define behavior that you're supposed to avoid? And there may be legitimate reasons for different implementations to do reverse compatibility differently. Sometimes keywords (like "extends") are un-defined because implementations didn't agree. There's users who just need their validator to keep working the way it always has, even if it's not cross-patform.

This isn't to say this is the way we should do it in all cases... if there's evidence that something we've published in JSON Schema Core is actually incompatible with a previous version, then let's figure out how to detect those cases, and address that.

awwright commented 2 years ago

(*) The only example of breaking reverse-compatibility that I've seen (where a single schema would be required to have different behavior depending on the draft you're implementing against) is when we stopped ignoring keywords next to "$ref", but this was done mostly because people were writing schemas like that thinking it worked that way. Meanwhile, there's no known cases of anyone writing schemas like {"$ref":"#foo", "type":"string"} expecting "type" to not do anything. Therefore, you could argue, we were fixing a bug rather than breaking reverse compatibility.

handrews commented 2 years ago

Therefore, you could argue, we were fixing a bug rather than breaking reverse compatibility.

Yeah that was pretty much my argument when advocating for it! Kind of like the recent RFC 9239's comments on settling on text/javascript:

I think there are two other breaking changes. The more serious was splitting $anchor off from $id, but that was part of getting back in alignment with OpenAPI. The complexity of $id was a major sticking point. On the other hand, if you jump from draft-04 to 2019-09 or later, it just looks like id was split into $id and $anchor 😅 kind of like how we split up dependencies without breaking compatibility.

We also changed exclusiveMinimum and exclusiveMaximum from boolean modifiers to minimum and maximum to independent numeric keywords, as they were also a source of confusion. Although that's less clear-cut because plenty of people used them correctly. Anyway, that's long done now. The next time we had a potentially breaking change (splitting dependencies) we did it in a compatible way (two new keywords so that the old one could be kept as an extension).

awwright commented 2 years ago

The more serious was splitting $anchor off from $id, but that was part of getting back in alignment with OpenAPI

I don't call this "breakage", you can still write a validator that understands "$id": "#foo" just as well as "$anchor": "foo".

We also changed exclusiveMinimum and exclusiveMaximum from boolean modifiers to minimum and maximum to independent numeric keywords

Same thing here, it's possible for implementations to implement both the boolean and numeric form, without confusion as to what the meaning could be.

jdesrosiers commented 2 years ago

newer publications replace older ones in their entirety

I know that's the way I-Ds are supposed to work, but JSON Schema has been a draft in name only for a long time. No matter what the intention or wording in the spec, the reality is that every release of the spec has been implemented and depended on in production systems. Because production systems don't want to depend on a moving target, they pick a version and pin to it. We can say that older releases are replaced, but that doesn't stop companies like Amazon continuing to depend on draft-04.

I don't call this "breakage", you can still write a validator that understands "$id": "#foo" just as well as "$anchor": "foo".

I don't agree with this characterization of what you consider a breaking change. In 2019-09, if you use "$id": "#foo", it should not define an anchor. If you allow this, the schema wouldn't work consistently across implementations. In fact, it should be an invalid schema because $id isn't allowed to have a non-empty fragment. In order to allow this, you have to ignore constraints in the spec and add behaviors not defined in the spec. IMO, this is not backward compatibility and would not be compliant with any release of the spec.

I totally agree that we can at least mostly add that backward compatibility into a future release, but that wouldn't fix the backward compatibility issues with the previous releases and those releases can't in practice be considered to be replaced, which is why I think the approach this document describes is appropriate.


This has gotten a bit off topic. There was agreement about the approach taken in this document, but if we need to re-open that discussion, let's get on the same page internally (over at JSON Schema) and come back here with what is decided.

handrews commented 2 years ago

@jdesrosiers

This has gotten a bit off topic. There was agreement about the approach taken in this document, but if we need to re-open that discussion, let's get on the same page internally (over at JSON Schema) and come back here with what is decided.

Which issue or discussion is that? I added my more fundamental points to https://github.com/ietf-wg-httpapi/mediatypes/issues/20#issuecomment-1148013900 in this repository, but I can move or copy them somewhere else if that is better.

jdesrosiers commented 2 years ago

@handrews The discussion was on Slack. It wasn't controversial, so it didn't make it to a Github issue or discussion. I wrote it up and asked for reviews and got approvals from all of the active contributors at the time. Admittedly, we didn't have an in-depth conversation about it. It didn't seem necessary. But, I'm happy to go back and do that now now that we are getting more feedback and differing opinions. #20 is an ok place to have that conversation, but I think it would be better to do it in the JSON Schema org so it ends up on the radar of more of the JSON Schema community. No issue/discussion exists. We'd have to start one.

handrews commented 2 years ago

@jdesrosiers thanks, glad I wasn't missing something obvious! I'll join in whatever discussion you start (I think it would be better for me to respond to how you frame the issue than opening something new myself that might not have quite the right focus).

awwright commented 2 years ago

We can say that older releases are replaced, but that doesn't stop companies like Amazon continuing to depend on draft-04.

This is not substantially different than, e.g. specifying RFC 2616 for HTTP, even though it's been replaced. Upgrading the reference should be possible. If not, what are the specific reasons why? Let's create issues in json-schema-spec.

I don't agree with this characterization of what you consider a breaking change.

Please suggest a better term, the name is less important than the concept: Cases where a single schema must have different behavior, depending on the draft you're implementing against. These cases are the ones that will be difficult to fix.

In fact, it should be an invalid schema because $id isn't allowed to have a non-empty fragment. In order to allow this, you have to ignore constraints in the spec and add behaviors not defined in the spec.

This isn't my reading (requirements for authors are different than requirements for validators); but assuming it works this way, that's easily fixable. We can re-define it with the historical behavior in a "Deprecated functionality" section.


I filed https://github.com/json-schema-org/json-schema-spec/issues/1242 to address this, and all cases where implementations feel the need to change their behavior based on $schema (or otherwise detected).

ioggstream commented 2 years ago

@darrelmiller split OAS in #49

@awwright @jdesrosiers consider this thread only for jsonschema. Let me know how to move forward with this issue :)