Inconsistent ABNF and related definitions

jandrieu commented 5 years ago

TLDR: The ABNF and related logical elements of a DID are inconsistently defined.

The current pull request refactoring the ABNF rules does not address the issue I'm raising here. https://github.com/w3c-ccg/did-spec/pull/168

My first point is that the charter isn't about defining a DID reference. It's about defining a DID. The current spec defines a DID as merely a component in a DID reference.

My second point is that a DID should include the entirety of the string that is presented, just like a URL includes the query part and and fragment part. https://tools.ietf.org/html/rfc3986 The current spec, with its definition of a DID as a component in a did-reference, a URI of the following form would NOT be a DID per the spec:

did:example:123456789abcdefghi#keys-1

Per the current spec (and the PR) this would be a DID-reference, not a DID. Only the first part is a DID. This is stated explicitly:

The generic DID scheme is a URI scheme conformant with [RFC3986]. It consists of a DID followed by an optional path and/or fragment. The term DID refers only to the identifier conforming to the did rule in the ABNF below; when used alone, it does not include a path or fragment. A DID that may optionally include a path and/or fragment is called a DID reference.

Even if one would prefer to follow the notion that the DID is "only the identifier", then this statement might be correct if rewritten as

The generic DID-reference scheme is a URI scheme conformant with [RFC3986]. It consists of a DID followed by an optional path and/or fragment.

However, this seems inconsistent with the last statement in that paragraph:

A DID that may optionally include a path and/or fragment is called a DID reference.

This sentence is logically impossible. A "DID" that includes a path or fragment is NOT a DID. It is a did-reference. It is that, not that it is called that. Further, by the previous statements, DIDs may not optionally include a path and/or fragment.

This is also out of sync with current consensus in the way we speak of DIDs. As illustrated in the last paragraph, we consistently refer to did-references as DIDs. From Section 4.3:

It is desirable that we enable tree-based processing of DIDs that include DID fragments (which resolve directly within the DID document) to locate metadata contained directly in the DID document or the service resource given by the target URL without needing to rely on graph-based processing.

Again, if "DIDs" can contain DID fragments, then by the above (problematic) language and the ABNF, that "DID" is actually a DID reference which by definition can't be true given the current ABNF.

My third point is that the did-reference spec looks like the URI spec, but is fundamentally different. From the current spec:

did-reference      = did [ "/" did-path ] [ "#" did-fragment ]
did                = "did:" method ":" specific-idstring

However, this is significantly different from the generic URI syntax (RFC3986)

URI         = scheme ":" hier-part [ "?" query ] [ "#" fragment ]

And further breaks with this URI-reference spec in RFC3986. It makes DID-references something other than URI-references, which makes DIDs something other than URIs.

URI-reference = URI / relative-ref

A URI-reference is either a URI, or a relative-ref and a relative-ref has the following syntax:

      relative-ref  = relative-part [ "?" query ] [ "#" fragment ]

      relative-part = "//" authority path-abempty
                    / path-absolute
                    / path-noscheme
                    / path-empty

If we attempt to follow that pattern more closely (yet combining the query & fragment parts to remove redundancy), I propose:

did = "did:" method specific-idstring relative-part 
method             = 1*methodchar
methodchar         = %x61-7A / DIGIT
specific-idstring  = idstring *( ":" idstring )
idstring           = 1*idchar
idchar             = ALPHA / DIGIT / "." / "-"
relative-part = path-part [ "?" query] [ "#" fragment ]
path-part = / path-absolute
                    / path-empty
did-reference = DID / relative-part

[Yes, this is incomplete. I leave query and fragment as an exercise for the editors. Pull them in from the RFC3986]

With this definition, a DID-reference is a URI-reference where the query term is the name of a service used for endpoint discovery by the user-agent. It is aligned with how URI-references are used. This implies that fragments should NOT be used to indicate dereferencing a DID to the resource at the end of a service endpoint, but rather to refer to the element itself (within the DID Document). Only the query term is interpreted as implying the further dereference.

Unfortunately, if service (from did-spec) is reasonably replaced with query (from the URI spec), then we still have the question of what the path part does. The current spec says

A DID path SHOULD be used to address resources available via a DID service endpoint.

If I follow the ideas of Sam Smith and DADs, then the path part would allow hierarchical indexing to provide resolution to an subset of a DID Document rather than the root of the DID Document. This would be a distinct from the query/service component which is for discovering the named service endpoint for a given DID id/path reference. Which is to say, I think this remains a point of significant ambiguity in the feature space (what is the path part)--and not just a grammatical error. I'm not sure I have a recommendation in this area. I don't know of any implementations that use such hierarchies other than what I recall from DADs--and I believe those may be addresses on a method-specific basis with colon ":" delimited identifiers.

Additionally, the section on fragments has this gem:

It is desirable that we enable tree-based processing of DIDs that include DID fragments (which resolve directly within the DID document) to locate metadata contained directly in the DID document or the service resource given by the target URL without needing to rely on graph-based processing.

This paragraph is ambiguous as to whether or not a fragment is a reference within the context of the document, as suggested by " (which resolve directly within the DID document)" or if it can reference something within the service endpoint, as suggested by "to locate metadata contained directly in the DID document or the service resource". I take this to mean that if the DID-reference (aka DID) contains a service (aka query), then any fragment reference is to be appended to the service endpoint for dereferencing in the context of that service endpoint. I can see the appeal of this. However, it presumes a certain understanding and stability of the structure of the named endpoint and could easily result in an invalid URI if the service endpoint already has a fragment. Further, it is doubly ambiguous when a fragment is used to refer to a service endpoint, as seems to be the proposal in the PR.

Rather, I would use the query part to look up the service, and when dereferencing the service endpoint, and then applying the fragment in the context of the dereferenced endpoint at lease uses a valid URL (just one fragment).

For example, using Example 8 from the spec https://w3c-ccg.github.io/did-spec/#example-8-various-service-endpoints

"service": [{
    "id": "did:example:123456789abcdefghi;openid",
    "type": "OpenIdConnectVersion1.0Service",
    "serviceEndpoint": "https://openid.example.com/"
  },

A relative part (query and fragment) of ?did:example:123456789abcdefghi;openid would dereference to the open id endpoint.

This is different from both the current spec--with its ambiguity around what the ";" is for (it is in the service section but not in the ABNF). And the PR mentioned above, which creates some new elements, which IMO are unnecessary and confusing as well as using fragments to refer to services.

Further, the current spec and proposed PR both use absolute URIs for service endpoint ids, but the PR says "the service path [is] a URI path and MUST conform to the ABNF of the path-rootless ABNF rule in [[RFC3986]]" Unfortunately, this is logically impossible. It is either a URI path (which can be a path-abempty, path-absolute, path-noscheme, path-rootless, or path-empty), or it is a path-rootless.

I may have mis-parsed the ABNF, but the current use of the service endpoints is inconsistent with how fragments work in other URIs, especially in URLs. For the sake of this example, let's revisit the above example (from Section 8. Examples) and just focus on the id of the first service endpoint:

"id": "did:example:123456789abcdefghi;openid"

which, in the PR is changed to

"id": "did:example:123456789abcdefghi#openid"

If we were to use a standard URL to refer to an element with that id in an HTML page (http://example.com), we would use either the absolute URL http://example.com#did:example:123456789abcdefghi;openid from the spec, or http://example.com#did:example:123456789abcdefghi#openid from the PR.

If we apply that pattern to a service identifier in the DID Document resolved by did:example:123456789abcdefghi, then the analogous URL would be did:example:123456789abcdefghi#did:example:123456789abcdefghi;openid for the original spec and did:example:123456789abcdefghi#did:example:123456789abcdefghi#openid for the PR.

Unfortunately, these PR URIs are invalid, because they contain two fragments, and all four are unnecessarily redundant.

In contrast, what I believe is the desired original DID is something like

did:example:123456789abcdefghi#openid

Which would refer to the openid service endpoint declaration in the DID Document and, based on my proposal,

did:example:123456789abcdefghi?openid

Would actually dereference the service at the endpoint.

This is not only a valid URI, it's concise and straightforward. In order to have that URI be consistent with how other URI fragments map to the ids in a document, the service endpoint should in the example 8 should be:

"service": [{
    "id": "openid",
    "type": "OpenIdConnectVersion1.0Service",
    "serviceEndpoint": "https://openid.example.com/"
  },

My comments therefore are in two parts:

The grammar of the current ABNF is inconsistent with itself, with URIs, and with how we talk about DIDs. This lack of rigor is especially problematic in ABNF which is there precisely to be rigorous because the rest of the document suffers from the inherent ambiguity in human language.
The logic of the current DID components are inconsistent with their similarly named parts in the URI spec. I propose that "path", "query", and "fragment" are better terms if they have slightly different definitions: a. path: like a URI path, a hierarchical index into the DID document b. query: the name of a service endpoint for discovery (and eventual dereferencing) c. fragment: a named index into the DID document, hierarchically constrained by the path
Seems like a the query part should follow RFC3986 def of queries, not path-rootless.

My apologies this is so long.

I'd like @burnburn and @ChristopherA to comment on this.

jonnycrunch commented 5 years ago

So, why not just use "/" as a path to the inside of the DID document?

If I understand the ABNF, after the DID URl would be separated from the naked DID with any character not :, ., -, so ; or/. While # would work, i'd like to reserve this for client side stuff.

In the IPID DID method, we are reserving the '#' for a decryption key, because as @ChristopherA mentioned above, and as in the www URI, it is NOT supposed to be sent to the server and instead is parsed locally.

According to https://www.ietf.org/rfc/rfc3986.txt:

3.5. Fragment

The fragment identifier component of a URI allows indirect identification of a secondary resource by reference to a primary resource and additional identifying information. The identified secondary resource may be some portion or subset of the primary resource, some view on representations of the primary resource, or some other resource defined or described by those representations. A fragment identifier component is indicated by the presence of a number sign ("#") character and terminated by the end of the URI.

fragment = *( pchar / "/" / "?" )

**The semantics of a fragment identifier are defined by the set of representations that might result from a retrieval action on the primary resource. The fragment's format and resolution is therefore dependent on the media type [RFC2046] of a potentially retrieved representation, even though such a retrieval is only performed if the URI is dereferenced.*** If no such representation exists, then the semantics of the fragment are considered unknown and are effectively unconstrained. Fragment identifier semantics are independent of the URI scheme and thus cannot be redefined by scheme specifications.

Individual media types may define their own restrictions on or structures within the fragment identifier syntax for specifying different types of subsets, views, or external references that are identifiable as secondary resources by that media type. If the primary resource has multiple representations, as is often the case for resources whose representation is selected based on attributes of the retrieval request (a.k.a., content negotiation), then whatever is identified by the fragment should be consistent across all of those representations. Each representation should either define the fragment so that it corresponds to the same secondary resource, regardless of how it is represented, or should leave the fragment undefined (i.e., not found).

mwherman2000 commented 5 years ago

@jonnycrunch Can you provide some examples? ....perhaps related to the use cases in https://github.com/w3c-ccg/did-resolution/issues/32?

w3c-ccg / did-spec

Inconsistent ABNF and related definitions #170