ietf-wg-httpapi / mediatypes

Other
5 stars 4 forks source link

"Published specification" should reference the specification #32

Closed awwright closed 1 year ago

awwright commented 2 years ago

When I'm looking up a media type registration, I would expect the "Published specification" to link directly to the specification that specifies how I can parse and interpret it.

The document currently cites itself as the specification document, and then normatively references several other documents. While this may work technically, this seems like an unnecessary level of indirection. I vaguely recall seeing this pattern before, but it's somewhat confusing, and if we can't link directly to the spec, I think it deserves some explanation.

RFC 2854 is an example that directly references an external specification document.

I would also expect a change controller to be listed. Why are these listed as "n/a"? I'm not sure what that means, in this context.

ioggstream commented 2 years ago

@awwright I think these should be compiled with the support of IANA once the actual content of the document is done. Feel free to PR where you think its useful.

dret commented 2 years ago

On 2022-03-31 23:34, Austin Wright wrote:

When I'm referring to a media type registration, I would expect the "Published specification" links directly to the specification that specifies how I can parse and interpret it.

that sounds like a reasonable thing to expect.

The document currently cites itself as the specification document, and then normatively references several other documents. While this may work technically, this seems like an unnecessary level of indirection. I may have seen this pattern before, but it's somewhat confusing I think it deserves some explanation.

my vote is for changing this to the pattern you're suggesting. which media type are you talking about?

which reminds me: @ioggstream, since we're managing multiple drafts in the same repo now, what about creating github labels so that we can track which issue is about which draft?

I would also expect a change controller to be listed. Why are these listed as "n/a"? I'm not sure what that means, in this context.

i think the usual pattern is to put the working group or IETF here, but i may be mistaken.

ioggstream commented 2 years ago

Labels are already here :) owners should be able to add them.

awwright commented 2 years ago

which media type are you talking about?

All of them are written this way (sorry I should have mentioned this)

jdesrosiers commented 2 years ago

When I'm looking up a media type registration, I would expect the "Published specification" to link directly to the specification that specifies how I can parse and interpret it.

This is a tricky case because there is no one published specification. There isn't even a fixed set of specifications. Each release of JSON Schema or Open API is it's own specification and future releases will be added to the list. We don't want to pin the media type to any specific release or have to go through the ceremony of updating the media type registration every time there is a new release. It's even more difficult for JSON Schema because the vocabulary system allows third parties to create their own dialects of JSON Schema, so the source of published specifications isn't even limited to the JSON Schema organization.

I think not linking directly to any one specification is the right thing. The media type defines how to identify what version of OpenAPI or dialect of JSON Schema the document conforms. That's just about all these media-types need to do. That's why I think it's reasonable for this document to cite itself as the specification document. Linking to the current specification, or all the existing specifications is an option, but I wouldn't want anyone reading it to be confused and think those are the only options. I think the alternative is to maintain a registry, but that doesn't seem necessary.

awwright commented 2 years ago

This is a tricky case because there is no one published specification. There isn't even a fixed set of specifications. Each release of JSON Schema or Open API is it's own specification and future releases will be added to the list.

I didn't mean to imply that every specification that describes a new feature for a format/protocol must be listed. Optional extensions to protocols and media types don't need to update the registration. Only the essential semantics need to be referenced.

For example, Cookies weren't mentioned in the first couple of releases of HTTP/1.1 at all (RFC 2068, 2616). (Recent releases of HTTP now point out how the Cookie header is inconsistent with the header syntax.)

However, I don't think the problem is that there's "no one published specification"—all of the references we need are listed in the normative references; their URLs just need to be copied to their respective "Published specification" field.

We don't want to pin the media type to any specific release or have to go through the ceremony of updating the media type registration every time there is a new release.

You can link to a document that changes. This is how HTML is defined (maybe it's an odd exception). Even if that's not possible, the change controller can update the registration without much fuss. (I wouldn't describe it as a "ceremony".)

I think not linking directly to any one specification is the right thing. The media type defines how to identify what version of OpenAPI or dialect of JSON Schema the document conforms.

A Standards tree registration requires a written spec with expert review. This seems reasonable to me. I think the likely outcome of not linking to a spec, or not fully writing it out, would be the IANA assigns a media type in the Vendor tree (instead of the Standards tree).

jdesrosiers commented 2 years ago

You can link to a document that changes.

Agreed. If such a document existed for any of these media types we wouldn't have a problem. My point is that no such document exists.

the change controller can update the registration without much fuss.

Good to know. Do you have any suggested reading you can share so I can better understand how that process works?

I think the likely outcome of not linking to a spec ...

What spec would you link to? This is the problem. The current release of JSON Schema is just one dialect out of many. There are many more that need to be covered by this media type including third-party dialects such as OpenAPI and MongoDB. These are not limited to extensions or add-ons to official JSON Schema releases (although some might be). The way the vocabulary system works, almost anything is possible. That's why it makes sense to me that the media type only define how to identify the dialect and delegate the rest of the semantics to that dialect. The dialects are the "extensions" like you mentioned that shouldn't require additional review or media type registration updates. The "expert review" only needs to cover the aspects that are currently in the document. The specific dialects are just extensions.

awwright commented 2 years ago

My point is that no such document exists.

What spec would you link to?

I don't think anyone would be confused if application/schema+json links to https://json-schema.org/specification.html (i.e. as a Table of Contents).

Good to know. Do you have any suggested reading you can share so I can better understand how that process works?

The top of https://www.iana.org/assignments/media-types/media-types.xhtml lists a few different RFCs, though not all of them apply (I forget off-hand which would be most important).

The current release of JSON Schema is just one dialect out of many. There are many more that need to be covered by this media type including third-party dialects such as OpenAPI and MongoDB.

From a media type perspective, these may not be application/schema+json documents strictly speaking. For example, keywords might follow a specific release of JSON Schema Validation, but as a whole, not necessarily follow other requirements.

For example, MongoDB will store the JSON Schema as BSON document or some data structure; to produce a valid "application/schema+json" document, it would probably need to add a "$schema" keyword then stringify it as JSON.

Regarding e.g. HTTP responses with Content-Type: application/schema+json, then you just follow the rules laid out in the specification. The specification has to document how forward compatibility is implemented, how old and deprecated behavior is handled, what each party is required to do and what they are required to accept, and so on.

Unfortunately JSON Schema omits much of this, or it internally contradicts itself. This was why I asked https://github.com/json-schema-org/community/discussions/119

For example, you suggest that we should be able to use "$schema" to select an alternate vocabulary or dialect; but nowhere in the specification does it mention how to handle unknown values of "$schema". It should probably error or give an indeterminate validation result, but this isn't explicitly mentioned, and would likely come up in expert review.

Likewise, I may have been too optimistic in my view that we could just remove functionality from JSON Schema and assume that implementations would continue to support it. While it's legal to implement since-removed behavior, this isn't proscribed, and maybe we should have done that (e.g. instead of removing the behavior, move it to a separate section called "Deprecated Keywords").

jdesrosiers commented 2 years ago

From a media type perspective, these may not be application/schema+json documents strictly speaking.

That depends on how you define the media type. You want this to be a media type for standard JSON Schema. I wrote this up to be capable of describing all types of dialects and that's the root of our conflict here.

awwright commented 2 years ago

That depends on how you define the media type. You want this to be a media type for standard JSON Schema. I wrote this up to be capable of describing all types of dialects and that's the root of our conflict here.

Can you speak a little bit more to this, to make sure I'm understanding you correctly?

First, maybe give me an example of how or why it would depend on the definition. Strictly speaking, many applications like MongoDB cannot be using application/schema+json, because at no point does it handle stringified JSON—it doesn't matter how we define the media type, they're using the data model, which is slightly lower-level.

Second—by "wrote this up" do you mean https://ietf-wg-httpapi.github.io/mediatypes/draft-ietf-httpapi-rest-api-mediatypes.html#section-2.2?

My understanding, when I wrote this issue, was that we would copy 14.1. "application/schema+json". One of the effects of the language in this repository is that the meaning of $schema going over HTTP (or email) will be different than the same value found in a JavaScript implementation. That doesn't seem correct to me.

dret commented 2 years ago

On 2022-04-13 05:18, Austin Wright wrote:

The current release of JSON Schema is just one dialect out of many.
There are many more that need to be covered by this media type
including third-party dialects such as OpenAPI and MongoDB.

From a media type perspective, these may not be |application/schema+json| documents strictly speaking. For example, keywords might follow a specific release of JSON Schema Validation, but as a whole, not necessarily follow other requirements.

handwaving in specs is a tricky thing. if the intention of the spec is to "sort of identify" JSON schema but not really, then it may be better to make this explicit. JSON schema by no means is the only format that is in this unfortunate situation. the media type should clearly state whether it's an actual well-defined format (and then it should link to the spec), or whether it's a "family of pretty similar dialects" and even then it may make sense to list to some popular ones known, but to mention that the list open.

Regarding e.g. HTTP responses with |Content-Type: application/schema+json|, then you just follow the rules laid out in the specification. The specification has to document how forward compatibility is implemented, how old and deprecated behavior is handled, what each party is required to do and what they are required to accept, and so on.

Unfortunately JSON Schema omits much of this, or it internally contradicts itself. This was why I asked json-schema-org/community#119 https://github.com/json-schema-org/community/discussions/119

For example, you suggest that we should be able to use "$schema" to select an alternate vocabulary or dialect; but nowhere in the specification does it mention how to handle unknown values of "$schema". It should probably error or give an indeterminate validation result, but the fact this isn't explicitly mentioned, and would likely come up in expert review.

Likewise, I may have been too optimistic in my view that we could just remove functionality from JSON Schema and assume that implementations would continue to support it. While it's legal to implement since-removed behavior, this isn't proscribed, and maybe we should have done that (e.g. instead of removing the behavior, move it to a separate section called "Deprecated Keywords").

maybe this is some good input for JSON schema to be a bit more specific in terms of what its processing model looks like? at a certain point in time thinking hard about this and defining this usually is a good idea.

even if all you can do is say "we kind of missed making this explicit when this started and now there are a couple of conflicting practices around, and here are some popular ones", this is better than just ignoring that this is an issue that affects interoperability.

markdown faced similar issues of not being a single well-defined spec, and it may be informative to look at its media type definition as an inspiration: https://datatracker.ietf.org/doc/html/rfc7763

awwright commented 2 years ago

I like your comparison to application/markdown @dret, that's a useful comparison. Though there's one important difference, usually documents are delivered in Markdown because that's how it was authored by humans; rendering to HTML (or RFCXML, etc) is secondary; and if there's an error or incompatibility, that's not the same kind of big problem as a false positive in JSON Schema. I think JSON Schema is more like a scripting language in this regard.

jdesrosiers commented 2 years ago

Sorry I haven't had the time to keep up with this. @awwright I'm not trying to ignore your concerns, I just don't have the capacity right now and won't for a few more weeks. At that time, I suggest we get together and discuss this in detail.

For now, I'll address this briefly.

the media type should clearly state [...] it's a "family of pretty similar dialects" and even then it may make sense to list to some popular ones known, but to mention that the list open.

This is exactly what it does right now. I agree that it's unfortunate that JSON Schema has become fragmented, but that is the situation we find ourselves in. I'd rather find a way to include those dialects than to dismiss them as not-really-JSON-Schema especially because the community knows these things only as "JSON Schema". Excluding them would be confusing at best. I think our hands are tied a bit because we aren't introducing something new, we're trying to standardize something that exists in the wild and should work for existing implementations (at least the major ones).

awwright commented 2 years ago

No worries, I've been trying to make the Friday calls but I keep getting pulled into other gigs.

it's unfortunate that JSON Schema has become fragmented

I went on a little bit of a tangent and so spun off a reply at https://github.com/orgs/json-schema-org/discussions/169, but I hope you can address my first question: I think the problem of fragmentation has been greatly improved, how you figure this?

ioggstream commented 2 years ago

@awwright addessed in YAML #42 for now.

ioggstream commented 2 years ago

The YAML part was moved out and fixed there.