Ambiguity regarding PATH and DID URLs

OR13 commented 2 years ago

This recent PR on the did web method highlights the challenges we introduced by reserving PATH in did URLs but not making any real use of it... in the core spec.

https://github.com/w3c-ccg/did-method-web/pull/51

^ This PR would be a breaking change for path based did webs (multi tenant did webs).

Transmute uses this feature today and is opposed to the proposed changes, but we can be convinced by good arguments.

Here is an example of how to use the current format:

https://github.com/OR13/signor

cc @mprorock @tplooker @selfissued

peacekeeper commented 2 years ago

That PR seems to affect the method-specific-id in DID web. Can you explain what that has to do with a path in a DID URL?

gribneau commented 2 years ago

The handling of both path and query in the DID core specification is inconsistent with the handling of these elements in the underlying RFC that defines URIs generally.

Specifically, in a DID URL, the path and query elements are reserved for locating items within a given DID, which must be entirely addressed within the method-specific-id, whereas RFC 3986 uses both path and query to locate a given resource, and fragment as an index location within that resource. As applied here, the resource is the DID document, meaning that path and query behave inconsistently with the global URI path and query elements.

This is the reason that DID:WEB replaces the slashes in an HTTPS URL with colons, which is clumsy at best.

For reference:

https://datatracker.ietf.org/doc/html/rfc3986#section-3.3

https://datatracker.ietf.org/doc/html/rfc3986#section-3.4

https://datatracker.ietf.org/doc/html/rfc3986#section-3.5

clehner commented 2 years ago

I think it might have been accidental that the referenced PR touched on DID path: https://github.com/w3c-ccg/did-method-web/pull/51/files#r855476705

Specifically, in a DID URL, the path and query elements are reserved for locating items within a given DID

Where do you see such a reservation? Do you mean locating items "within a resolved DID document", rather than "within a DID"?

DID Core says the following about the path component of a DID URL (https://www.w3.org/TR/2021/PR-did-core-20210803/#path):

As with URIs, path semantics can be specified by DID Methods, which in turn might enable DID controllers to further specialize those semantics.

CCG's DID Resolution draft says the following about dereferencing a DID URL containing a DID path and/or DID query (source) - which I think is consistent with DID Core:

3.1) The applicable DID method MAY specify how to dereference the input DID URL. 3.2) The client MAY be able to dereference the input DID URL in an application-specific way.

I suppose that DID paths could be useful to enable applications that expect path-like semantics (e.g. web pages, or rsync) to use DIDs, where a URL would use a DID as the authority component rather than a hostname; or to just enable general-purpose resource storage by a DID controller. But I don't think I've yet seen any DID methods specify such usage. (Edit: @peacekeeper pointed out did:indy uses DID path: https://github.com/w3c-ccg/did-method-web/pull/51#issuecomment-1105753743)

peacekeeper commented 2 years ago

Specifically, in a DID URL, the path and query elements are reserved for locating items within a given DID

This isn't true, also see https://github.com/w3c-ccg/did-method-web/pull/51#issuecomment-1105744334. The standard URI components path, query, and fragment in DID URLs are completely aligned with RFC 3986.

gribneau commented 2 years ago

Right here:

https://www.w3.org/TR/did-core/#did-url-syntax

A DID URL is a network location identifier for a specific resource. It can be used to retrieve things like representations of DID subjects, verification methods, services, specific parts of a DID document, or other resources.

Every example provided there can be reasonably described as a "secondary resource" as described in RFC 3986.

https://www.rfc-editor.org/rfc/rfc3986#section-3.5

3.5. Fragment

The fragment identifier component of a URI allows indirect identification of a secondary resource by reference to a primary resource and additional identifying information.

gribneau commented 2 years ago

The history of the did:web revisions for handling URL paths can be viewed here:

https://github.com/w3c/did-core/issues/183

Initially, the DID path was treated as interchangeable with the URL path, consistently with RFC 3986.

A discussion regarding whether a full DID URL (including a path component) could be a DID subject was had, resulting in the situation we have today.

https://github.com/w3c/did-core/issues/183#issuecomment-593703715

If the id element could not contain a "DID URL", including a path, then a "DID URL" clearly cannot be the primary resource.

I also needed content negotiation, so I interpreted the core specification less stringently and added conneg in did:psqr, which is what I am using going forward.

gribneau commented 2 years ago

Perhaps a visual will help.

clehner commented 2 years ago

https://github.com/w3c/did-core/issues/821#issuecomment-1105884534

A DID URL is a network location identifier for a specific resource. It can be used to retrieve things like representations of DID subjects, verification methods, services, specific parts of a DID document, or other resources.

Every example provided there can be reasonably described as a "secondary resource" as described in RFC 3986.

Those listed things do look like secondary resources, and I've seen examples in discussion threads of such - except the last thing "or other resources" would seem to cover the needed case, i.e. a resource that is not required to be a secondary resource. Also, I only see RFC 3986 refer to primary and secondary resources in the context of use of fragment components (https://www.rfc-editor.org/rfc/rfc3986#section-3.5).

In DID Resolution I also see "secondary resource" only mentioned when there is a fragment part: https://w3c-ccg.github.io/did-resolution/#dereferencing-algorithm

So I don't see DID Core (or DID Resolution) requiring secondary-resource-like behavior for DID URLs with paths like with fragments. But DID method(s) and/or applications could specify/implement it as such.

gribneau commented 2 years ago

So I don't see DID Core (or DID Resolution) requiring secondary-resource-like behavior for DID URLs with paths like with fragments. But DID method(s) and/or applications could specify/implement it as such.

Is it possible to construct a valid DID document residing at a DID URL, including a path?

Is it possible to construct two valid and unique DID documents at different DID URLs in which only the path varies?

Note:

https://www.w3.org/TR/did-core/#did-subject

jandrieu commented 2 years ago

DID documents don't "reside" at DID-URLs. The DID alone resolves to a DID Document. DID-URLs dereference to specific resources (as may be specified between the method and the DID Document for that DID).

It is generally understood, but not specified in did-core, that dereferencing a "naked" DID returns the DID Document. Fragment semantics are consistent with that. did:ex:abc#name is taken to refer to a node in the graph of the DID Document.

Is it possible to construct a valid DID document residing at a DID URL, including a path?

Technically, that depends on the DID method, as the meaning of path, query, and fragment parts are technically up to the method to define. For practical purposes I would advise my clients to treat DID-URLs with path or query parts as referring to a different resource than the DID Document. If you want the DID Document, use an empty path.

Is it possible to construct two valid and unique DID documents at different DID URLs in which only the path varies?

Only through the technicality that different DID URLs can point to different resources: did:ex:abc/thing1 and did:ex:abc/thing2 can dereference to different resources, AND you the resources they point to could be DID Documents. However, neither would be expected to be the DID Document of the DID part of the DID URL, but also nothing is preventing you from publishing the DID Document for the DID at a DID URL. It just would not generally be understood to be the DID Document for that DID.

So, if you mean "Is it possible to construct two valid and unique DID documents at different DID URLs in which only the path varies, each of which is the definitive DID Document for the DID in the DID URL?", I would say, generally, no. Although a DID method could define the meaning of path part to allow that. I'm not sure how that would be useful, but there is no more a requirement that every DID Method be smart than there is that every website be useful.

gribneau commented 2 years ago

So, if you mean "Is it possible to construct two valid and unique DID documents at different DID URLs in which only the path varies, each of which is the definitive DID Document for the DID in the DID URL?", I would say, generally, no.

That is my understanding of the current Subject requirement. This has been discussed within the context of a DID method called "web", and it has been decided that we should keep this limit, which effectively means that mapping DID to HTTPS URIs means that the HTTPS URI authority (domain) identifies the primary resource, which yields exactly one identity per domain. A site with a million users still has only one DID with a simple, clean mapping.

Although a DID method could define the meaning of path part to allow that. I'm not sure how that would be useful...

The did:web method was originally written to bootstrap trust from an established web domain into other DID methods. It supported a single DID document at /.well-known/did.json via https and the method specific identifier was the domain name.

One of the first public issues created on that method asked whether it would be possible to support multiple identities on a single domain, given that a great many websites of different flavors support large numbers of users. Stuffing a URL path into the method specific identifier by replacing slashes with colons made this possible, if clumsy, without violating the core specification.

Whether a thing has value depends on the objective. It's plainly obvious that hundreds of millions or billions of web pages intend to publish some flavor of identity. Examples include social media profiles, individual pages on professional networking sites, and author profiles on blogs. If our objective is to maximize the functional capacity of the web, making it possible to extend this massive quantity of de-facto decentralized identities to implement interoperability with the emerging DID ecosystem strikes me as obviously valuable, and I am not alone in this position.

It is easily possible to encode a DID document within HTML markup, and it is easily possible to leverage content negotiation to return a pure DID document on the same HTTPS URL that carries the human readable identity, and content negotiation is directly referenced in DID core for such a purpose.

The only thing preventing us from easily embracing and enabling the clear and obvious existing efforts at decentralized identity is an interpretation of RFC 3986 that removes the path element as the primary means of locating a primary resource.

I find it bizarre. I think it is a self-inflicted injury, and I think it is unnecessarily inconsistent with a stated mission to maximize web functionality.

jandrieu commented 2 years ago

That is my understanding of the current Subject requirement. This has been discussed within the context of a DID method called "web", and it has been decided that we should keep this limit, which effectively means that mapping DID to HTTPS URIs means that the HTTPS URI authority (domain) identifies the primary resource, which yields exactly one identity per domain. A site with a million users still has only one DID with a simple, clean mapping.

I believe you are conflating the authority part with the resource.

In a traditional http URL, http://example.com/index.html the authority part, example.com is neither the resource nor the primary resource.

In the same vein in a DID-URL, the DID is the authority part, and NOT the resource. In the DID URL, did:ex:abc/index.html the DID did:ex:abc is not the resource. It takes dereferencing the entire DID URL to interact with the resource, just as with HTTP.

So when you say

HTTPS URI authority (domain) identifies the primary resource

This is incorrect.

As with HTTP, with DIDs, the authority part (the DID) does NOT identify the primary resource any more that "example.com" identifies a "primary resource" for the URL http://example.com/index.html

NOTE: The DID Document is not the primary resource. Full stop. The DID Document is a meta-data intermediary, like a DNS record, that helps you dereference the actual resources identified by DID URLs.

Each DID URL define different resources within the namespace defined by the DID, just as path, query, and fragments are used to identify different resources within the namespace define by the authority part of an HTTP URL.

which yields exactly one identity per domain. A site with a million users still has only one DID with a simple, clean mapping.

I don't understand how you are using identity here. A DID (and its DID URLs) can be used to instantiate any number of identities. A DID itself is an identifier--not an identity--through which we can resolve the means to cryptographically verify particular interactions with, or taken on behalf of, that identifier. With DIDs and DID URLs, you have a scalable identifier architecture that is essentially unbounded, just as with Web URLs.

As site with a million users could/should/would still have a million DIDs, one for each of those users. Both Twitter and Facebook use this architecture with Web URLs. Those resources are not "secondary resources" they are the resource pointed to, e.g., the primary resource of the URL http://twitter.com/JoeAndrieu is my profile page, NOT twitter.com. The authority part of the resource does not define the primary resource. You need the entire URL for that (with both DID and Web URLs).

The only thing preventing us from easily embracing and enabling the clear and obvious existing efforts at decentralized identity is an interpretation of RFC 3986 that removes the path element as the primary means of locating a primary resource.

That's a rather expansive claim. Unfortunately, the path element is exactly how you identify the resource within the namespace of the authority. Just like with all RFC3986 conformant URLs.

Can you unpack how removing a term clearly defined in RFC3986 could possibly make use more conformant?

msporny commented 2 years ago

Let's take an example from a DID Method that's based on top of HTTPS:

RESOLVE did:web:subject.example/people/jane

Plug that into a resolver and you might get a DID Document that looks like this:

{
  "id": "did:web:subject.example/people/jane"
}

That's one subject... but try this and you might get nothing (jane is missing, it's just now a random directory on the Web):

RESOLVE did:web:subject.example/people

... but try this and you might get the authority (aka DNS domain) DID Document:

RESOLVE did:web:subject.example

{
  "id": "did:web:subject.example"
}

DID Core does not allow for that fairly sane thing to happen today... that's what @gribneau's point is... that's the error we made in the DID WG.

gribneau commented 2 years ago

Upon reflection, it is possible to interpret DID core differently.

5.1.1 requires the DID Subject to conform with 3.1, which in turn asserts that RFC3986 controls.

RFC3986 3.3 provides that:

A path is always defined for a URI, though the defined path may be empty (zero length).

It seems, then, that these are equivalent:

did:example:123456789abcdefghijk

did:example:123456789abcdefghijk/

Given that DID Subjects are URIs conforming to RFC 3986, and given there is no such thing as a URI without a path under RFC 3986, it seems that DID Subjects must necessarily have a path.

talltree commented 2 years ago

Thank you @msporny for finally describing the issue in a way that I could understand what this discussion was all about.

I don't believe we made an error, however. There is a fundamental difference between DIDs as identifiers and URLs: the DID namespace is not and was never intended to be hierarchical. Each DID method is a flat global namespace.

That's what enables it to be decentralized.

That's also why it is the raw DID itself (the part before any forward slash, question mark, or hash sign) always identifies a DID subject. A DID URL (i.e., a DID plus a non-empty path or query) can of course identify other resources, but the raw DID identifies the DID subject.

So, although I wasn't part of creating it, I believe that's why the did:web method is structured the way it is.

gribneau commented 2 years ago

Thank you for the clarification @talltree.

I do have one question:

Would supporting the existing HTTPS URL use of path as an element in identifying a DID subject in did:web interfere with any of the other methods that do use a flat namespace?

There are examples of flat namespaces that allow textual delimiters in the key names. Amazon S3 comes to mind. It appears as a hierarchical file system because the key names include the / character, but it is in fact a single flat namespace.

talltree commented 2 years ago

@gribneau:

Would supporting the existing HTTPS URL use of path as an element in identifying a DID subject in did:web interfere with any of the other methods that do use a flat namespace?

I don't see that it would create any problem for any other DID method. That use of the path in did:web could be defined in the did:web DID method specification. However consumers of DID URLs that use the did:web method would need to understand that the full DID subject identifier is not just the raw DID (the portion before the first forward slash), but includes the path. And that would apply to all DID URLs based on the did:web method (unless the did:web spec defined some special syntax at the start of the path that differentiated between a path that identifies a DID subject and a path that does not).

There are examples of flat namespaces that allow textual delimiters in the key names. Amazon S3 comes to mind. It appears as a hierarchical file system because the key names include the / character, but it is in fact a single flat namespace.

That pattern is in fact explicitly supported in the ABNF for a raw DID. We included several chars that can be used as "internal" namespace delimiters within the flat DID namespace. Colons and en dashes are the most commonly used. For example the new did:indy method (not done yet) uses an internal colon-delimited namespace to identify the Indy network (and subnetwork if applicable). For example:

did:indy:sovrin:test did:indy:sovrin:staging did:indy:idunion

gribneau commented 2 years ago

I don't see that it would create any problem for any other DID method. That use of the path in did:web could be defined in the did:web DID method specification.

I agree.

We included several chars that can be used as "internal" namespace delimiters within the flat DID namespace. Colons and en dashes are the most commonly used.

Colons are used as internal delimiters in did:web today to encode the HTTPS URL path into the method specific identifier. It is still a draft and there are gaps, but a number of implementations have been rolled out using this scheme.

Reusing the HTTPS URL path as the DID URL path, and recognizing it as the DID subject would significantly streamline the method.

w3c / did-core

Ambiguity regarding PATH and DID URLs #821