w3c / did-core

W3C Decentralized Identifier Specification v1.0
https://www.w3.org/TR/did-core/
Other
407 stars 95 forks source link

DID Document processing when media type is unknown #838

Open msporny opened 1 year ago

msporny commented 1 year ago

It's becoming increasingly apparent, as predicted by some in the WG, that having two different JSON encodings for DID Documents where the only difference between the two is the existence of a single field (@context) was a mistake (as a number of us in the WG had warned when the decision was made to create a new abstract data model for DID Documents).

At this point, implementers are knowingly implementing against the guidance of the specification and placing @context in DID Documents served as application/json or application/did+json. There is only one example of a DID Method (that I know about) -- Microsoft's did:ion, out of 161 registered methods, that knowingly uses the pure JSON representation. At this point, we should catalog how many DID Methods knowingly use the JSON-only (the one w/o @context) representation, and use that data to modify the DID Core specification.

There's a sane way out of this, and that's to define what you should do when the DID Document that you're retrieving doesn't result in an application/did+json or an application/did+ld+json media type:

If the document is valid JSON, and you can find an @context that contains a DID Core context as the first item in the array, then you know you're dealing with a DID Document and can proceed even w/ a media type such as application/json or application/octet-stream. That is, we define how to process a document you're expecting to be a DID Document when the media type is wrong.

We could also say that if you can't find an @context, then its up to the application to do content sniffing to see if it can proceed (but we leave that content sniffing up to the application). This might be controversial given that we're adding this text for a single DID Method and it leads to non-interoperable content sniffing behavior.

What we can't be certain of yet is if .json can be a valid file extension and application/json can be a valid media type for a DID Document without causing conflicts in the IETF Media Types registry. The spec currently says .didjson and .didjsonld are the valid file extensions (to handle the cases where the files exist on filesystems AND to support automatic media type results by putting these media types in the default media types shipped with operating systems). I'm not sure it's possible to associate .json as a valid file extension, since that is bound to the application/json media type and that is not a media type that's defined by the DID Core specification (and the JSON media type has no profiling mechanism, like JSON-LD does).

Some variations of the approaches above would cover all of our bases, so we should try to figure out which ones would achieve consensus and apply those to the ED of the specification.

msporny commented 1 year ago

As a relevant aside, one of the arguments for defining processing when the media type is wrong is that "modern tooling gets tripped up on application/did+..." -- and that's always true for any new media type.

Keep in mind that many years passed where web servers were serving .json files as text/plain and application/octet-stream before application/json moved its way into the OS-level media type registries. We're in a similar situation here.

Defining specific media types and file extensions have merit, even if there are implementers that will perpetually get it wrong. I think what we're trying to do here is specify what happens when they get it wrong, much like HTML5 standardized all the complex algorithms web browsers used to fix broken HTML and render a page... 'cause doing that ended up being way easier (it only took 15 years) than teaching developers to do the right thing (which is just a losing proposition).

peacekeeper commented 1 year ago

I'm fine with providing some guidelines like this, as long as we make it clear that this is about dealing with situations that are wrong and shouldn't occur in the first place, and that it's much better to use the correct media types (as they are defined in DID Core) in the first place.

Also, I think that much of this is specific to the did:web method, and to the HTTP binding of DID Resolution. I'm not sure if any of this really applies to DID Core. Especially the part about .json file extensions seems to be exclusively a did:web issue/mistake.

One reason why we have the Abstract Data Model is that there were complicated politics and preferences around the JSON and JSON-LD representations. But another reason why we have it is also that DID Resolution is an abstract function that doesn't necessarily involve any concrete representation at all. E.g. if you resolve a did:key locally into an in-memory data model, then you also just successfully resolved a DID to a DID document, but you never had to deal with any media types at all during this process.

kdenhartog commented 1 year ago

@peacekeeper's insight here I think is a wise one that this ADM was really spurred about from the need to get resolution right and set a properly abstracted interface, but splitting this across two specs in this way has led to way more confusion than necessary. Especially for people who are writing methods but aren't aware of the did resolution spec.

In my opinion it would make sense for us to actually pull this work out of the DID-Core spec going forward and to isolate this abstraction directly into the did resolution spec. This to me seems like a cleaner and more clear way to address this concern.

Looking back on the arguments now where I was trying to push for getting resolution into DID Core in theory made sense at the time, but didn't accurately account for the politic climate that would need to be addressed to do it properly. I think it was a mistake for us to push this in now and there's a lot of opportunity to clean this up if we can get consensus on moving resolution forward in the next go at this spec.

peacekeeper commented 1 year ago

Another problem with serving DID documents as application/json is that this media type doesn't define how URI fragments are dereferenced.

pchampin commented 3 weeks ago

This was discussed during the did meeting on 2024-09-05: https://www.w3.org/2024/09/05-did-minutes.html#t07