Proposal to resolve representation type, file extensions, and mime type issues

gribneau commented 2 years ago

We have a number of open issues regarding different representation types, file extensions, and mime types that have been floating around for a while now (#8 #15 #20 #41). I think we are also not in compliance with the IANA Considerations section of the core spec at this point.

Resolving these issues gracefully seems likely to compel us to modify the simple resolution schema. A file named did.json should not contain, for example, CBOR or YAML, and setting the appropriate content type without changing the file extension strikes me as a poor workaround.

Beginning with the core spec, we should be minimally supporting at least two representations:

.didjson
application/did+json

.didjsonld
application/did+ld+json

We can accomplish this by changing the URL resolution logic and relying on webserver configuration to return an appropriate file and set the proper content type by resolving to a directory containing multiple representations rather than a file.

did:web:example.com:charlie

https://example.com/charlie/

In this scenario, the webserver would be configured to recognize a number of index files, each with the appropriate file extension. These file extensions would also need to map to the appropriate content types.

charlie/did.didjson
charlie/did.didjsonld
charlie/did.cbor

We could require that simple json be accepted for all resolutions to support dumb webservers, but recognize the accept request header to prioritize other formats on servers that are capable of handling that logic.

Accept: application/did+ld+json
Accept: application/cbor
Accept: application/did+json

The additional complexity should all be entirely optional so that a simple webserver hosting a simple .didjson file will work today, but there is a path available to those who need other representations.

This would require modifying the URL resolution logic (and also the existing resolvers that have been implemented), but I think it would make more sense in the end.

What does everyone think?

dmitrizagidulin commented 2 years ago

@gribneau - I agree, we do need to address this (at least, to bring it into compliance with the DID Core spec, IANA registration wise).

(I agree with you that we should support both "dumb" webservers that are just hosting files, but also smart ones that can perform content negotiation.)

So, we've got three distinct cases for did resolution.

Resolving a DID for the top level domain (like did:web:example.com) by appending a well-known did.* filename. (Incidentally, we really should reconcile this with (or at very least mention) the Well Known DID Configuration spec.)
Resolving a DID for a given folder/directory -- for example did:web:example.com:charlie, where /charlie/ is a folder (and not just a filename). Current spec does this by always appending /did.json.
Resolving a DID by pointing to a specific file. For example, did:web:example.com:accounting:charlie.didjson to https://example.com/accounting/charlie.didjson. The current spec does not allow for this (it would instead resolve it to https://example.com/accounting/charlie.didjson.did.json).

I think we all agree we need 1 (just the top level domain). I do think 3 is useful (the ability for the did:web to point to a particular file). I'm not completely sure 2 is needed? Maybe we can combine 2 and 3 somehow?

So, let's see, in terms of your proposal, it would look something like:

If the `did:web` DID just points to a full domain domain (no path component):

Example: did:web:example.com

If the request has an Accept: header that includes application/did+, the server must first try to look at the path /did.<extension appropriate for the accept content type>. (It might make sense for us to also accept application/json and application/ld+json as well.)
If no Accept: header is given, or the Accept: header does not include a /did+ content subtype, the server must look for the default filename -- first, try did.didjson (I like your suggestion for that default).
If no file is found from the steps above, try did.didjsonld filename next.
If still not found, the server should drop back to Well Known DID Configuration discovery? Or better yet, server returns a 404, and our spec should recommend that the requesting client should try the /.well-known/resources/did-configuration/ path next?

If the `did:web` DID ends in a file extension

Example: did:web:example.com:did.didjson

Look for the DID file at that extension, and return it using the appropriate content-type for that extension.
If that exact filename is not found, just return a 404, do not fall back on other extensions.
If the file exists, but its content-type does not match the type specified in the request's Accept: header, and the server can't automatically translate between those types, return a 406 error. (Meaning, if the request is for did:web:example.com:did.pdf with an Accept: application/did+json, and the server doesn't know how to translate the PDF file to JSON, it should return a 406 status.)

If the `did:web` DID does not end in a file extension

Example: did:web:example.com:charlie

This case I'm less clear about (we support it currently, but I don't know if we should still support it). But if we do, something like:

First, the web server must look for a file/resource with that exact name (so, the url https://example.com/charlie). If the resource exists and is a DID document, it's up to the server to know what to do as far as content-type (perform type conversion depending on the Accept: header, or return a 406, or just return it as whatever default type the server knows about).
If the path maps to a folder or directory, the server should look for did.didjson and did.didjsonld, just like with the use case 1 (full domain)? So, did:web:example.com:charlie to https://example.com/charlie/did.didjson, etc.
If it's not a directory, the server should look for the file extension appropriate to the Accept: header, and then the default extension?
If not found by any steps above, return 404

I'm not thrilled with these steps, seems like it might be too complicated. But then again, maybe it's a good use case.

(Oh, also, I don't think we should specify cbor format in our examples, to start with. I'm not aware of any implementations out there supporting did:web + cbor. Or if we do need it, let's use the application/did+cbor and .didcbor content type from https://www.w3.org/TR/did-cbor-representation/ ).

gribneau commented 2 years ago

@dmitrizagidulin - I agree with all of that, and I think we can express this logic elegantly enough to efficiently support the did:web:example.com:charlie case with little overhead.

At present, we have these steps to resolve a URL:

Replace ":" with "/" in the method specific identifier to obtain the fully qualified domain name and optional path.
Generate an HTTPS URL to the expected location of the DID document by prepending https://.
If no path has been specified in the URL, append /.well-known.
Append /did.json to complete the URL.

If I recall correctly, the conditional in item 3 was added to support paths, and the rest of it was already there to support Well Known DID, which points back here for specification.

We'll need to coordinate with DIF on updates.

This logic can be extended to support full filenames ( did:web:example.com:did.didjson) easily enough between 2 and 3.

In the case of a mismatched Accept header and file extension, I think we should return a 406 rather than attempt to translate without changing the filename extension. We could go a step further and return a redirect to the proper URL, but I don't think we should ask servers to return mismatched filename extensions and mime types.

To make multiple representations work for either bare domains or directories, we can append / in 4 and let server-side logic select the best file based on the accept header and the default.

That would leave us with a process something like this:

Replace ":" with "/" in the method specific identifier to obtain the fully qualified domain name and optional path.
Generate an HTTPS URL to the expected location of the DID document by prepending https://.
If a recognized filename extension is present, exit resolution.
If no path has been specified in the URL, append /.well-known/did.
Append /.
Set header as appropriate (Accept: application/did+ld+json,application/did+json).

The rest of the logic to handle these requests will occur server side.

Note that we don't need to set the Accept header in step 3 - the filename extension implies the mime type.

We would need to extend for well known DID configuration, and the change from /.well-known/did.json to /.well-known/did/ would need to be coordinated with DIF or deferred.

Does that work for resolution?

OR13 commented 2 years ago

Thanks for raising this.

DID Web is valuable, because of its simplicity, any changes that erode simplicity should be evaluated with the highest degree of skepticism.

DID Methods are NOT required to support all possible did document representations, and they don't need to have multiple 'resolution processes'... we can probably simplify did web further than it already is, and should.

Based on how simple did web is, and how painful adding support for other representations is, you can see why its good that all did methods are not required to support all representations.

That being said, changing well known URIs, adding required processing for accept headers, is in the realm of possibility... we should be careful trying to address all these on the same issue though...

I am in favor of getting rid of the .well-known resolution process, since its produces a single did document, where as the other process supports an unbounded number of did documents.

I am in favor of evaluating proposals for handling representations of did web did documents, that don't encode the representation into the IRI... that means did:web:example.com:truck:123 -> JSON, CBOR by accept header... I would mandate the accept header always behave like this:

Accept: application/json -> JSON / JSON-LD (as google does)

Accept: application/did+json -> JSON / JSON-LD (as did core allows)

Accept: application/did+ld+json -> JSON-LD (which happens to also be JSON... 
there should be no implied JSON-LD processing, 
all terms should be retained from the file, including any errors, etc...)

Accept: */* -> JSON

I would add a MUST support application/did+json and a MAY support application/did+ld+json or other formats.

I would add a MUST ensure that all representations return the same key material, service endpoints, etc.... (yikes).

I would add a MAY ensure metadata is preserved such as @context or foobar which might not be defined in that @context when considering JSON-LD.

You can see that the biggest danger is supporting multiple representations is that they might not "all get updated at once"... the controller might forget to remove that leaked key from CBOR, and so authentications from CBOR will be available to the attacker, while authentications from JSON would not.... then the attacker takes the content negotiation feature and pretends to only understand CBOR... and off to the races...

IMO, these security issues with multiple representations are sever enough to postpone support for multiple representations in did web, I would be perfectly fine with did web only supporting JSON as it does today, indefinitely.

I would trust the entire did method less if it supported any form of "did document representation" negotiation, and I would strongly oppose adding support for that at this time, but I am happy to keep refining the potential paths forward in issues and discussions.

OR13 commented 2 years ago

I spent a few minutes exploring what I perceive to be the proposed changes wrt directory name....

https://or13.github.io/did-web-examples/about/

https://github.com/OR13/did-web-examples

I like the idea of using the index of the directory to get rid of the file extension, a lot.

I don't like the idea of supporting anything other than JSON.

gribneau commented 2 years ago

Those are good examples. I generally get pedantic with the trailing / just for clarity.

If we incorporate the full paths from @dmitrizagidulin , we have something like this:

did:web:example.com:alice:did.didjson
https://example.com/alice/did.didjson

did:web:example.com:alice:did.didjsonld
https://example.com/alice/did.didjsonld

did:web:example.com
https://example.com/.well-known/did/

did:web:example.com:alice
https://example.com/alice/

In the first two cases, the file is fully specified in the URL, so the webserver just returns that file.

In the second two cases, the webserver returns the best index file. Those who wish to support JSON-LD and JSON would put files in those directories with the appropriate file extension, configure that directory for both of those files, and configure the server to recognize the associated mime types.

This is the simplest scheme that I see that drives consistency with the core specification while avoiding file extension and mime type mismatches.

It supports future representations (CBOR, etc.) by simply adding the new representation to the webserver configuration, so I don't think we need to get into any of that.

dmitrizagidulin commented 2 years ago

@gribneau: One minor question about the latest iteration, about the top level domain part -- did:web:example.com should be https://example.com/.well-known/did.json as per https://identity.foundation/.well-known/#well-knowndidjson, right? (Instead of /.well-known/did/)

Or are we slightly altering that (to support content negotiation via header)

gribneau commented 2 years ago

If we want to support multiple representations without mismatches between file extensions and mime types, we'll need to use /.well-known/did/ returning an index file for the bare domains.

We'll need to coordinate with DIF to make the bare domain changes more or less contemporaneously.

Here is sample nginx configuration to prefer a single mime type as specified in the Accept: request header within a directory, with a default back to json.

Add this location stanza in the https server scope that returns the DID files, and extend for additional DID directories:

# specify file extensions for did representations
# with a default to plain json
set $did index.json;
set $didDefault index.json;
if ($http_accept = 'application/did+json') {
  set $did index.didjson;
}
if ($http_accept = 'application/did+ld+json') {
  set $did index.didjsonld;
}

location ~ ^(/.well-known/did)/?$ {
        try_files $1/$did $1/$didDefault;
}

To return proper mime types, these lines must be added to the configuration as well:

    application/did+json                  didjson;
    application/did+ld+json               didjsonld;

I believe the setup above will meet the core specification while preserving the very simple resolution we currently have.

dmitrizagidulin commented 2 years ago

@gribneau - nice, ok. 👍 from me.

gribneau commented 2 years ago

I took initial steps in #43.

We should review to determine how many of the outstanding issues noted in the document can be resolved.

w3c-ccg / did-method-web