Update resolution scheme #42

gribneau commented 3 years ago

Based on reviews of #43, I've reduced the scope of the PR to index files rather than hardcoded /did.json and removed references to JSON-LD. Full file paths are still included, but can be removed and left in history for future consideration.

gribneau commented 2 years ago

Resolved

Remove rants from the abstract, focus on the positive aspects of did:web.

Done. 93a94dc41c1f36f7b86800a324dc737660ba3f69

Be specific about the server configuration information that is associated with MUST statements.

The inappropriate normative language has been removed.

Done. e308e221ac122163819a15aad4774a737dd4b64f

Ideally, move different concepts/upgrades into different PRs.

The full filepath behavior described here is an easy win, but we can pick it up later.

Removed. f18ed93bc519e4573e10ed1d71a998f143a7de92

Outstanding

It seems that we have two outstanding issues to resolve.

Clarify whether or not JSON-LD is illegal for did:web, and if so, how machine verifiability is practically achievable?

This PR is currently agnostic as to DID representation - it simply makes it possible to support a variety. JSON is used as an example, but the core specification is referenced for an authoritative list of possible representations, which included JSON and JSON-LD the last time I checked.

My personal preference is to preserve the flexibility, leaving DID representation as an implementation choice. It has been suggested that we limit to JSON and derivative representations and only handle media types that IANA has recognized, which means that we would presently support only the .json filename extension and application/json for all representations in the JSON family.

Within this issue, there is a conversation about whether @context should be required, and we will likely need to discuss that.

Move the 'index' feature into its own PR

Looking forward, hardcoded filenames ( did.json ) will eventually break the agreement between filename extensions and media types when DID representations outside the JSON family are supported. Index files enable us to return the contents of a DID document with variable media types, so this provides a path forward.

IANA has not yet approved the DID-specific media types mentioned in IANA Considerations, so we can limit the present view to .json and application/json while noting the path forward in the information section.

It is reasonable on the same basis to defer content negotiation (handling the Accept) header, and also move those references into information with an indication that content negotiation should be standardized when the media types are available.

@OR13 @msporny @dmitrizagidulin @mprorock

dmitrizagidulin commented 2 years ago

@gribneau - thank you for your continued work on this PR; this is great stuff.

Ok, so, the remaining issues in the PR (which unfortunately we can't really separate, since they do have to do with the resolution scheme) have to do with:

Index files (both for the top-level domain, and for each individual directory)
Whether or not we should allow multiple representations for DID Documents, for the did:web method.
(related to above) Whether or not we should allow content negotiation.

After having separate conversations with a couple of you, I'd like to see if we can agree on some guiding principles for this DID method, and would like to make a proposal that would hopefully satisfy everyone's concerns

Guiding Principles

We want did:web DIDs to be able to be hosted on "dumb" web servers (which cannot do dynamic processing of DID Documents, to convert between representations etc). An example of this would be Github Pages, or the innumerable amount of hosted websites where a webmaster just has a CPanel (FTP-like) access to hosted files.
We do NOT want webmasters (people who can upload files to the top directory, or subdirectories of a site) to have to maintain multiple copies of a DID Document (one for each representation). As @OR13 points out, the security implications of needing to rotate a key, and making sure all copies/representations are in sync... should be avoided.
We DO want JSON-LD @contexts to be possible/supported, so that those companies that use contexts can continue using them in their did docs.
We recognize that did:web implementers will be adding new properties and types (one very common example -- new serviceEndpoint types), but may not have the skills or the ability to add those new properties to proper @contexts. So, we do NOT want to require that all did:web documents be able to be processed by JSON-LD processors. (However, see the previous items -- we DO want to still allow those who do want to do it.)
We DO want to be able to host multiple DIDs per domain (which means, at very least, different DID Docs in subdirectories).

Implications

Can we agree on these guidelines?

On the remote chance that we can -- the guidelines only leave us a couple of options (and hey, constraints are good). For example, principle 1 (we want people who only have access to "dumb" web servers to be able to host these DIDs) means that we cannot do content negotiation. And yes, we know that the Web Architecture would prefer it if we could. But most web servers will not be able to convert JSON DID Documents to YAML or whatever. Meaning, we not only have to deal with the fact that IANA has not registered application/ld+did+json, but that even if it did, it doesn't matter, many webservers just can't do conneg at all.

Furthermore, if we want to support dumb web servers (and we do), it means no automatic server-side handling of index files, either. Many webmasters just don't have access to the Nginx config for their servers, so it's not guaranteed that they can set up default index file handling. But we still want nice concise DIDs like did:web:example.com, right? So how can we have that without server-side index file handling?

Well, we can have those nice things if we specify the client-side mapping. So that it's the client resolver (not the web server via redirect), when encountering did:web:example.com knows to automatically convert it to https://example.com/ and append did.json (or whatever index file convention we agree on). And so the resolver will always request https://example.com/did.json, when faced with that DID.

So early on in this PR's comments, I thought that we might be able to still have multiple representations, using this client-side index file mapping. Meaning, if a resolver wants a JSON-LD DID Doc, it'll append index.jsonld, and if it wants a JSON one, it'll append index.json, and so on. But as principle 2 (we do NOT want multiple copies of a DID Doc to keep in sync) points out, that's asking for potential trouble.

So where does that leave us? If we can't have content negotiation, and we can't have automatic server-side redirects to index files based on Accept headers, and we can't emulate content negotiation via just having multiple copies of a DID Doc with different extensions?

It means we need to agree on a superset representation, and agree on a client-side index file mapping.

Could that superset representation be JSON-LD? I'm a huge fan of JSON-LD, but I recognize that many implementers (or just regular application developers doing prototypes using did:web DIDs) will want to add new properties (such as new service endpoint types, etc), but won't know how to add those properties to their own @contexts. So I don't think it's realistic (meaning: people will object) that we require did:web docs to be fully JSON-LD processable, and that all terms in them be also present in valid contexts. Also, you only need to be strict about contexts when you're signing stuff (like in VCs), and we agreed we're not going to be signing the did:web docs themselves.

But! We still want the option to use JSON-LD. For many companies, it's our competitive advantage / secret sauce. So, making JSON-LD or @contexts illegal for did:web is out of the question.

So where does this all leave us?

Proposal

I'd like to propose the following, which I think would follow those guiding principles, address everyone's concerns, and makes for fairly concise spec and easy implementation.

We require that all did:web DIDs be served as JSON resources -- to be files with .json extensions, so that even if a web server doesn't know how to provide the correct content-type, the browser can at least derive application/json just from the extension. This means - no content negotiation, and no multiple copies for different representations.
We specify a client-side mapping -- when a resolver encounters a did:web DID, it will first convert it to a URL (performing percent-decoding as specified in this PR), and then always know to append did.json to it. And that's the file it will request.
We do not bother with did:web DIDs specifying individual files. So, no https://example.com/alice-did.json. The file part of the path is always derived automatically.
We do not require @contexts to be in those JSON files on the web server. But we also don't do anything ridiculous like make contexts illegal.
We take advantage of the fact that DID Core guarantees us that any conforming DID Document has reserved properties (id, service endpoints, verificationMethod, etc) that also make it compliant with a DID Core @context (https://www.w3.org/ns/did/v1). And so we specify that all did:web docs have an implicit default context of https://www.w3.org/ns/did/v1. Which means that if a resolver does want to process it as JSON-LD, it can always append the did-core context (if it's not in the did doc already).

What do people think?

gribneau commented 2 years ago

Thanks @dmitrizagidulin.

I am comfortable with the statement of issues and the Guiding Principles.

The Implications that webservers cannot do content negotiation, and won't be able to specify index files seem overbroad to me. Apache and Nginx can be made to work without redirects, including at least basic content negotiation as described in the core spec, and Apache's DirectoryIndex can appear in a .htaccess file within a directory to set the index, even with FTP management or a dashboard like cPanel. @OR13 has made Github static pages work. Directory index functionality, at a minimum, is ubiquitous.

In short, I think the failure cases will be less common than anticipated. Given that what we're dealing with here is a failure case, I think there is one upgrade we can make to the Proposal:

We specify a client-side mapping -- when a resolver encounters a did:web DID, it will first convert it to a URL (performing percent-decoding as specified in this PR), and then always know to append did.json to it. And that's the file it will request.

If we first attempt to make the request as currently described in the PR, but in cases of failure append the did.json and run the request again as an additional fallback, then we will have covered both the common success case and the infrequent failures while preserving our path forward in compliance with the core specification.

Does that work for everyone?

OR13 commented 2 years ago

@gribneau thanks for pushing on this... :)

I would prefer not to need to make 2 network requests to resolve a DID, ever.

These edge cases are complexity we should factor out imo.

In a world where there is only 1 did content type, accept header can be ignored, and a file with a .json extension can be produced from the did to url conversion.

in a world where there is more than 1 did content type, accept header MUST be understood, and a file with a MATCHING file extension can be produced from the DID and Accept parameter conversion.

I don't see asking web servers to handle this for us as helping simplify the standard.

dmitrizagidulin commented 2 years ago

+1, I agree, I'd prefer not to make 2 network requests, if at all possible. What would supporting directory index functionality actually buy us, anyway?

gribneau commented 2 years ago

Further Consideration

I would prefer not to need to make 2 network requests to resolve a DID, ever.

+1 I thought about this last night, and it seems to me that a second request as a failback is a poor replacement for simply supporting full filenames in the spec itself.

These edge cases are complexity we should factor out imo.

If we revert f18ed93bc519e4573e10ed1d71a998f143a7de92 then implementations that suffer from difficulty with index files or content negotiation still have a path enabling a simple resolution to a single specified DID representation by specifying the full filename. I prefer that as a solution to second requests in cases of failure.

Representation selection is explicit when full filenames are used - the filename extension specifies the representation. This effectively factors out edge cases by removing any ambiguity.

In a world where there is only 1 did content type, accept header can be ignored...

Consider these cases:

did:web:example.com:alice.json
--> https://example.com/alice.json

did:web:example.com:alice.didjson
--> https://example.com/alice.didjson

did:web:example.com:alice.didjsonld
--> https://example.com/alice.didjsonld

did:web:example.com:alice.didcbor
--> https://example.com/alice.didcbor

Each of those maps to exactly one file, and the filename extension controls the content-type header that the server returns. Using this scheme, there is no need for content negotiation or index file selection - the webserver simply returns the file, which is as basic and simple as it gets on the server side.

Dmitry's upgrade on full filenames solved this problem before it was called out. Very nice.

Benefits

What would supporting directory index functionality actually buy us, anyway?

The core spec envisions multiple representations for a DID, and there are different filename extensions and media types associated with each. We've been resolving to a file named did.json since the earliest versions of this method when it was serving a single DID for a domain to act as an anchor for that entire domain.

Now that we are able to serve multiple DIDs on a given domain, the method can support significantly more use cases. This PR takes the next step to support more of the core specification by making it possible to return various representations. The use of index files removes the filename (and filename extension) from the HTTPS URL to enable webservers to return a DID document with the appropriate media type in the content-type header, in compliance with very long standing best practices.

The use of directory urls, index files, and accept header analysis is a very direct mapping between the existing HTTP world and the IANA considerations section of the core spec.

To summarize the above, directory index functionality and content negotiation gets us a considerably more complete method for resolving DIDs with simple webservers.

The Bigger Picture

That's nice, as far as it goes, but there is a higher order benefit to be had here.

There is tremendous overlap between the core spec and existing HTTP content negotiation practices. This will all be very familiar territory for systems administrators and software developers, and the similarity should be comforting to those tasked with implementing both client and server side applications to handle this method.

Given that we are presently facing formal objections on the core spec, I think we should move forward with this PR, inclusive of full filename resolution to avoid edge cases, to provide one more concrete example that addresses at least some of the stated concerns.

dmitrizagidulin commented 2 years ago

@gribneau - You're right that allowing did:web DIDs to link to individual files would allow multiple representations without the need for server-side index file handling or content negotiation.

However, it greatly increases the risk of going against guiding principle 2 from above -- I strongly believe (as I think does @OR13) that we must not allow multiple file versions of the same DID document.

So, I would support linking to individual files only if we also add normative language in the spec that only one such representation file must exist, for any given DID. (Meaning, if you have did:web:example.com:did.yaml, there must not exist any other file for that did with any other extension.)

While I can live with that, I would personally prefer we simplify the did:web method to having just one representation. (Greatly simplifies the developer ask.)

tplooker commented 2 years ago

While I can live with that, I would personally prefer we simplify the did:web method to having just one representation. (Greatly simplifies the developer ask.)

+1, I may still be mis-understanding the ask being made in your latest proposal here @gribneau as I am just getting up to speed on this thread.

Supporting multiple representations of the same DID document and further more having these resolved through different identifiers (DIDs) (e.g did:web:example.com:did.yaml vs did:web:example.com:did.json) feels like problematic coupling (identifier, data model and data representation used) and would be liable to create security issues through the opportunity for consistency problems across the different representations.

So, I would support linking to individual files only if we also add normative language in the spec that only one such representation file must exist, for any given DID. (Meaning, if you have did:web:example.com:did.yaml, there must not exist any other file for that did with any other extension.)

At this stage Im a -1 to any proposal that involves requiring the file extension to be leaked into the resulting did web identifier, as I said before though I may still be mis-understanding this ask?

gribneau commented 2 years ago

The consensus seems to be that we should not support multiple representations or content negotiation within did:web as laid out in did:core. Upon reflection, I think it would be counterproductive to move forward with this PR if we aren't actually going to follow through to support standard Web Architecture upon which those sections of did:core are clearly based.

If there will be a more complete HTTP method for DID resolution, that method should

resolve to directory URLs,
handle Accept headers,
and return the appropriate representation together with the appropriate media type in the content-type header.

This is rather common functionality in the HTTP world and is described in the IANA Considerations and DID Resolution Options sections of the core spec. Resolving to directory URLs in did:web is unnecessary to return a single representation and would introduce confusion by hijacking the HTTP content negotiation capabilities without supporting content negotiation.

Absent objections in the next few days, I'll simply revert and close.

OR13 commented 2 years ago

@gribneau there appear to still be changes in here that might be accepted in isolation... but you might find it easier to open new PRs with smaller change sets to get them in.

Thanks for helping drive these issues forward.

I am in favor of closing this PR, and opening smaller ones focused on 1 item at a time, so long as they do not:

add any new representations
remove any existing resolution rules for did web

I do think we should continue to debate removing the .well-known resolution rule, but its probably better to stick to issues for now.

OR13 commented 2 years ago

I opened https://github.com/w3c-ccg/did-method-web/issues/49 to discuss .well-known resolution rule simplification, lets discuss there.

kdenhartog commented 2 years ago

Yeah splitting this PR into a few different ones would be good. For example the abstract change could be addressed independently

gribneau commented 2 years ago

Yeah splitting this PR into a few different ones would be good. For example the abstract change could be addressed independently

I agree. There are many cosmetic improvements that should be made.

msporny commented 2 years ago

I opened #52 to discuss general simplification for DID URL resolution -- we're making this way harder than it needs to be.

msporny commented 2 years ago

At this stage Im a -1 to any proposal that involves requiring the file extension to be leaked into the resulting did web identifier, as I said before though I may still be mis-understanding this ask?

What if a did:web identifier file extension was a part of the identifier? URLs are meant to be opaque. Once you go did.json, you are stuck with that as your identifier... you /can't/ serve anything else... we have people objecting to conneg in this same thread... one of those things is going to have to give.

gribneau commented 2 years ago

At this stage Im a -1 to any proposal that involves requiring the file extension to be leaked into the resulting did web identifier, as I said before though I may still be mis-understanding this ask?

What if a did:web identifier file extension was a part of the identifier? URLs are meant to be opaque. Once you go did.json, you are stuck with that as your identifier... you /can't/ serve anything else... we have people objecting to conneg in this same thread... one of those things is going to have to give.

Before removing the full filename option, this PR offered a path to both content negotiation and full file paths, leaving the choice in the hands of the implementation.

It is now at an impasse and has now been superseded by multiple more granular PRs.

w3c-ccg / did-method-web