w3c-ccg / did-method-web

DRAFT: did:web Decentralized Identifier Method Specification
https://w3c-ccg.github.io/did-method-web/
Other
32 stars 17 forks source link

Percent-decode each path fragment. #51

Closed dmitrizagidulin closed 2 years ago

dmitrizagidulin commented 2 years ago

Extracted from the discussion in PR https://github.com/w3c-ccg/did-method-web/pull/47.

The current spec contains an edge case that prevents round-trip conversion between did:web URIs and HTTPS URIs.

Currently:

https://example.com/example:subdirectory/ -> did:web:example.com:example:subdirectory  ->  https://example.com/example/subdirectory/

With this PR:

https://example.com/example:subdirectory/ -> did:web:example.com:example%3Asubdirectory  ->  https://example.com/example:subdirectory/

Note that using : characters in your web URL paths is not at all recommended by the authors of this spec. However, since it is allowed by current URL rules, it's important that we address this corner case.

dmitrizagidulin commented 2 years ago

@msporny - we went with the colon syntax so that we could enable DID relative fragments, as described in DID Core in Example 6. (Although I personally don't see the usefulness of relative DID dereferenced URLs, I do want to respect the spec.)

dlongley commented 2 years ago

@dmitrizagidulin,

Looks like you found a typo in that example (and I saw at least one more). The value MUST be percent encoded, so there would be no slashes there. The normative definition for relativeRef is correct though:

If present, the associated value MUST be an ASCII string and MUST use percent-encoding for certain characters as specified in RFC3986 Section 2.1.

https://www.w3.org/TR/did-core/#did-parameters

In other words, using / in your DID should have nothing to do with whether or not you can use relativeRef, the value of relativeRef MUST be percent encoded.

OR13 commented 2 years ago

Ping for reviews / feedback otherwise will close in a week or 6 months.

dmitrizagidulin commented 2 years ago

@OR13 - any objections to this PR?

gribneau commented 2 years ago

This PR results in asymmetric handling of URL paths with percent-encoded characters.

To avoid breaking those URLs, those would need double-encoding.

I think we're better off scrapping the colon-delimited scheme.

mprorock commented 2 years ago

This syntax breaks interop with the PATH parameter in a DID URL, and needlessly introduces breaking changes, unless a substantial number of existing implementers come forward planning to support this, i suggest we close.

Similarly, mesur.io does not intend to implement this or other breaking changes

OR13 commented 2 years ago

Cross posted to did core, https://github.com/w3c/did-core/issues/821

removing pending close label.

peacekeeper commented 2 years ago

This syntax breaks interop with the PATH parameter in a DID URL

the value of relativeRef MUST be percent encoded.

I don't understand what this PR has to do with either the path in a DID URL, or with relativeRef.

This only seems to be about the method-specific-id of the DID, not anything else in a DID URL.

I think the PR is fine.

quartzjer commented 2 years ago

I don't think addressing this issue requires a breaking change to the current did:web method. IMO it could be addressed by simply adding some language saying something like "Colon characters MUST NOT be used in path elements for the target HTTPS URL".

gribneau commented 2 years ago

@peacekeeper wrote:

I don't understand what this PR has to do with either the path in a DID URL

DID:WEB rewrites https URL paths to fit them into the method-specific-identifier. This is necessary because the DID core spec reserves DID URL path elements for navigation within a single DID document, and this is the reason there is confusion around percent encoding and percent decoding.

It would be preferable to handle DID URL path elements the same way that RFC 3986 handles generic URI path elements - as information locating the resource (DID document) rather than something inside the resource. URI fragment elements are used to locate information inside a resource.

The same logic applies to query elements.

dwaite commented 2 years ago

For the most part, the logic should align with RFC3986, and special behavior does not need to be defined. Instead, you would be better off referencing the relevant parts of that RFC and defining test vectors.

E.g. on Percent-Encoding:

A percent-encoding mechanism is used to represent a data octet in a component when that octet's corresponding character is outside the allowed set or is being used as a delimiter of, or within, the component.

Typically a percent-encoded character has no distinct semantic meaning if the encoded characters did not meet this requirement and could have been representable without encoding. However, this determination is typically done as part of the interpretation of the URL, possibly with some language library help.

For example, the use of application/x-www-form-urlencoded data within a query or fragment is not defined by the base URI spec or the definition of the HTTP URI scheme, but is so common that people may assume so.

As such, I would recommend the following behavior for converting a did:web to a https URL.

  1. Separate the DID method-specific components by colon (:) into individual components.
  2. For the first part, percent-decode the value.
  3. If this decoded part contains a colon, separate on that value into a potential host and an optional port
  4. If the potential host portion contains IDNA (Internationalized domain name) characters, first punycode that value to obtain the actual URI host.
  5. If there are no additional method-specific components, the path is '/.well-known/did.json'
  6. Otherwise
    1. for each additional method-specific component:
      1. Percent-decode the value into a series of unicode scalars
      2. Percent-encode the value based on the delimiters for a path segment (e.g. characters outside pchar as defined in RFC3986)
    2. Add an additional path segment of did.json
    3. Combine all path segments into a path
    4. Your resulting URI has a https scheme, the determined host, the optional port, and the path, along with any originally specified fragment.

For the most convoluted example I can manufacture did:web:%F0%9F%92%A9.la%3A8443:foo%3Abar%2Fbaz#x:

Note: This includes punycode, which may very well be something that is explicitly not supported.

  1. Components are broken into:
    %F0%9F%92%A9.la%3A8443
    foo%3Abar/baz
  2. Percent decode the first value to unicode scalars:
    💩.la:8443
  3. separate the port from the potential hostname:
    [💩.la](https://xn--ls8h.la/)
    8443
  4. punycode the hostname:
    xn--ls8h.la
  5. For our single additional component, foo%3Abar%2Fbaz:

    1. Percent-decode the value:
      foo:bar/baz
    2. Percent-encode items that could are not part of pchar:
      foo:bar%2Fbaz
  6. add an additional segment for the actual JSON doc
    foo:bar%2Fbaz
    did.json
  7. combine segments to path
    foo:bar%2Fbaz/did.json
  8. create HTTPS URL from constituent components:
    https://xn--ls8h.la:8443/foo:bar%2Fbaz/did.json#x
peacekeeper commented 2 years ago

@gribneau

This is necessary because the DID core spec reserves DID URL path elements for navigation within a single DID document

No it doesn't. Do you have a reference where you found this information?

It would be preferable to handle DID URL path elements the same way that RFC 3986 handles generic URI path elements - as information locating the resource

That's actually how it is. A DID URL with a path can be used to locate any type of resource, even an image, arbitrary JSON data, PDF, etc. And the fragment is used to reference a secondary resource that is part of, or related to, the primary resource.

peacekeeper commented 2 years ago

did:indy is an interesting method that uses DID URLs with paths to locate resources such as schemas, credential definitions, etc:

https://hyperledger.github.io/indy-did-method/#did-urls-for-indy-object-identifiers

msporny commented 2 years ago

@OR13 wrote:

As of today, Transmute does not intend to implement this breaking change.

@mprorock wrote:

Similarly, mesur.io does not intend to implement this or other breaking changes

path isn't defined in Section 2.3 Method-specific identifier -- you can't make a breaking change to something that was never defined. When are the spec editors going to fix this bug? :)

I also want to make sure everyone understands the consequences of keeping the specification, which is broken when it comes to colon processing (and URL processing, in general), as-is. Here are places where colons can legitimately be placed that will break did:web implementations if the spec doesn't change in a breaking way:

image

As you can see above... path includes the colon character... and slash characters. Colons are used to delimit user accounts on Mediawiki and Wikipedia:

https://en.wikipedia.org/wiki/Wikipedia:Wikipedians

https://commons.wikimedia.org/wiki/User:Ser_Amantio_di_Nicolao

Are we starting to see the problem here? This PR is at least attempting to fix the "colons in segments" issue... but, unless I'm missing something, the problem runs much deeper than that. What am I missing? Where is path defined?

OR13 commented 2 years ago

I'm in favor of implementing @dwaite 's suggestion, I would approve a PR that implements it and provides test vectors.

I don't think we should address each encoding issue that deviates RFC 3986 in a 1 off section of text in a CCG draft, that will quickly lead to a painful spec, that does not make good use of existing normative references or provide concrete test vectors for proving interop.

msporny commented 2 years ago

I'm in favor of implementing @dwaite 's suggestion, I would approve a PR that implements it and provides test vectors.

Or we could replace that complex algorithm (which, don't get me wrong is impressive, @dwaite) with this:

Convert to did:web:

  1. Replace "https://" with "did:web:"

Convert to https:

  1. Replace "did:web:" with "https://"

... and get rid of all of the unnecessary complexity and deviation from RFC 3986. :)

peacekeeper commented 2 years ago

Or we could replace that complex algorithm

I never really understood why people invented did:web in the first place instead of just using https://

msporny commented 2 years ago

I never really understood why people invented did:web in the first place instead of just using https://

Because we needed:

  1. Something universal to indicate that "There is a DID Document that lives at the end of this URL" (technical),
  2. Something to reassure people that we weren't trying to kill the Web (political),
  3. Something to enable all resolution to go through a common interface -- DID Resolution (technical).

In theory, we could:

  1. Try to address 1 today via the VC2WG Data Integrity work (by specifying how HTTP-based URLs work and noting that the mediate type when fetching a verificationMethod MUST be application/did+* -- but then the people that don't like content negotiation might get cranky -- and the people using did:web today will /definitely/ get cranky. :)
  2. Try to address 2 by saying we did 1, we use https-urls, so we're not trying to kill the web.
  3. Splitting resolution into either 1) you're working with an HTTPS URL, or 2) you're working with a DID URL.

I expect trying those things will be messier than just getting did:web right. :)

OR13 commented 2 years ago

Perhaps you should register did https?

gribneau commented 2 years ago

@gribneau

This is necessary because the DID core spec reserves DID URL path elements for navigation within a single DID document

No it doesn't. Do you have a reference where you found this information?

It would be preferable to handle DID URL path elements the same way that RFC 3986 handles generic URI path elements - as information locating the resource

That's actually how it is. A DID URL with a path can be used to locate any type of resource, even an image, arbitrary JSON data, PDF, etc. And the fragment is used to reference a secondary resource that is part of, or related to, the primary resource.

DID URLs are not supported in the id element of the DID document. Only a DID (and not a DID URL) can represent the entire document.

image

jandrieu commented 2 years ago

DID:WEB rewrites https URL paths to fit them into the method-specific-identifier. This is necessary because the DID core spec reserves DID URL path elements for navigation within a single DID document, and this is the reason there is confusion around percent encoding and percent decoding.

As @msporny pointed out, the path part of a DID URL is not a part of the method-specific ID. There's a conflation going on here that appears to be confusing some contributors. I thought Manu was confused at first, but his smiley face convinced me he was trying humor to point out the distinction.

It would be preferable to handle DID URL path elements the same way that RFC 3986 handles generic URI path elements - as information locating the resource (DID document) rather than something inside the resource. URI fragment elements are used to locate information inside a resource.

DID-URL path elements are handled the same way that RFC3986 handles generic path elements. It is just that the DID-URL != DID. And the method-specific-id is part of the DID--and only by inclusion is it part of the DID-URL. By which I mean the path part of the DID-URL is a completely different component than the URL path of did:web that gets encoded into the method-specific identifier. You could have a did:web DID-URL with a path part above and beyond the path encoded in the DID itself. The intepretation of that path part is entirely up to the did:web spec.

What I think is causing some confusion is that the resource involved in a DID URL is NOT the DID Document (as implied by @gribneau).

For example,

The meaning of fragment, path, and query parts is up to the DID Method to define, as long as their representation of those parts in the DID-URL itself is consistent with RFC3986.

To wit, with did:cosmos we use the path part to identify downloadable or interactive resources defined in the linkedResource property of the DID Document (an IID Resource) and use the fragment part to identify addressable entities within the namespace of the DID (called an IID Reference). These are based on the Interchain Identifier Specification at https://github.com/EarthProgram/Identifiers/blob/main/index.md.

did:cosmos DID URLs for IID resources are "locating the resource", which you might think of that as "in the DID (namespace)", in the same way that normal web resources are "in the website's namespace".

Note that "normal" URLs point to different resources by having different paths. http://example.com/file1.png and http://example.com/file2.png are different resources, both mediated by the authority part of example.com (the actual resources may be on a server anywhere thanks to redirection).

DID URLs, especially how they are used in did:cosmos (and other IIDs) match this behavior exactly. Each DID-URL with a path part can separately point to different resources, just like regular URLs.

I believe Orie is just fixing a round-trip encoding algorithm problem with did:web in particular, which has nothing to do with did-core.

So let's not conflate the resource of a DID-URL with the Subject of the DID nor the DID Document. The resource of a DID-URL is defined within the context of the DID and likely declared or presented within the DID Document, but they can refer to ANY resource, not just the DID Document.

gribneau commented 2 years ago

I am not confused @jandrieu, I simply disagree with the core specification's interpretation of RFC 3986.

@msporny wrote:

Note that there will be a PR to deprecate the colon syntax and prefer just straight HTTP URL translation in time.

This cannot currently happen because the DID Subjects of both of these would be did:web:example.com, which is inappropriate for obvious reasons:

did:web:example.com/alice

did:web:example.com/bob

The limitation is imposed by section 5.1.1.

In the presence of that limitation, the handling of path and query in DID URLs can only be seen as consistent with the the fragment in RFC 3986, which is distinguished from path and query by virtue of the secondary resource reference:

The fragment identifier component of a URI allows indirect identification of a secondary resource by reference to a primary resource and additional identifying information.

In contrast, both path and query "serve to identify a resource", while the authority preceding them does not identify a resource at all.

It is unfortunate that the path and query sections of the RFC do not use the "primary resource" terminology. This confusion might have been avoided if they had.

msporny commented 2 years ago

@gribneau wrote:

This cannot currently happen because the DID Subjects of both of these would be did:web:example.com, which is inappropriate for obvious reasons

Yep, @gribneau is correct... several technical issues:

1) DID Core is too restrictive when it comes to the id field in a DID Document. We should've supported a DID URL instead of just a DID in that position -- we can still fix this in DID v2.0 (since that would just be an expansion of the current normative statement). 2) did:web has yet to properly define what it meant by path... which led some to interpret it as path per RFC 3986, while others interpreted it as "some complex new way of encoding URL paths using colon syntax", which creates multiple incompatibilities with RFC 3986 when it comes to round-tripping these values. All of this would be much easier if we could all just agree that did:web's method-specific-identifier ABNF is just plain incomplete/wrong. 3) This PR tries to fix the latter to make colon-path encoding work in a way that could be more interoperable, but it might be that this whole endeavor is misguided to begin with (and did:web is thoroughly broken in its current state).

In other words, use the rules in the did:web spec to transform these HTTPS URLs into did:web DID URLs and back out again:

https://foo.example/users:jane
https://foo.example/users:jane#keys:1
https://foo.example/users:jane?timestamp=2022-04-22T19:55:27.730Z#keys:1

I suggest that there are no rules in the did:web spec that tell you how to round trip those URLs. If someone knows of any, please point them out to me 'cause I can't find them in the spec.

One interpretation is that method-specific-id was always meant to be just the URL-encoded authority in did:web -- so, for the above, the "method-specific-id" is foo.example... everything up to the first slash... and the subject identifier in the DID Document was everything up to the first ? or # or the end of the URL -- for example did:web:foo.example/users:jane... where you could state something like the following and it would make sense:

"verificationMethod": "did:web:foo.example/users:jane#keys:1"
OR
"verificationMethod": "did:web:foo.example/users:jane?timestamp=2022-04-22T19:55:27.730Z#keys:1"

... but it seems like some are saying (without this current PR), no, the proper encoding of those URLs is actually this:

"verificationMethod": "did:web:foo.example:users:jane#keys:1"
OR
"verificationMethod": "did:web:foo.example:users:jane?timestamp=2022-04-22T19:55:27.730Z#keys:1"

which, when going back through round tripping (per the rules in the current did:web specification) would be turned into these HTTPS URLs:

"verificationMethod": "https://foo.example/users/jane#keys/1/did.json"
OR
"verificationMethod": "https://foo.example/users/jane?timestamp=2022-04-22T19/55/27.730Z#keys/1/did.json"

The former approach seems to work (and is undocumented) the latter approach is broken (note all the crazy slashes that exist in the URL that shouldn't be there... this seems to be what some in this thread are suggesting the current spec states). Or multiple variations of different interpretations in between. What am I missing? Can someone round-trip those URLs for me in a way that is consistent?

dwaite commented 2 years ago

DID Core is too restrictive when it comes to the id field in a DID Document. We should've supported a DID URL instead of just a DID in that position -- we can still fix this in DID v2.0 (since that would just be an expansion of the current normative statement).

IMHO, the problem is that the usage of DID path is not restrictive enough - there isn't currently a way to differentiate a DID URL path that conforms to the described behavior of the method on DID "CRUD" operations. The resource might instead be hosted content or a service, and may support additional interactions outside the DID resolution definition.

gribneau commented 2 years ago

@msporny wrote:

  1. DID Core is too restrictive when it comes to the id field in a DID Document. We should've supported a DID URL instead of just a DID in that position -- we can still fix this in DID v2.0 (since that would just be an expansion of the current normative statement).

I agree. This is an easy fix. It was, however, discussed and rejected prompting some of what we have now.

  1. did:web has yet to properly define what it meant by path... which led some to interpret it as path per RFC 3986, while others interpreted it as "some complex new way of encoding URL paths using colon syntax", which creates multiple incompatibilities with RFC 3986 when it comes to round-tripping these values.

The did:web method is currently broken for URL paths containing colons. It becomes more broken if IPv6 addresses are allowed, or when one attempts simple authentication.

The create section is clearly inadequate as well. Specific steps should be provided in addition to the handful of examples provided.

All of this would be much easier if we could all just agree that did:web's method-specific-identifier ABNF is just plain incomplete/wrong.

I don't disagree. It is, at best, an 80% solution today.

msporny commented 2 years ago

I agree. This is an easy fix. It was, however, discussed and rejected prompting some of what we have now.

Hrm, rejected by whom? Speaking as the lead DID Core spec editor, I don't recall that we've made any consensus decisions of the sort. That idea is still very much alive and well, IMHO.

The best way to address the problem in DID Core, however, is to get folks to admit it's a problem here and then bring that problem back to DID Core (with a fairly simple errata fix to the spec noting that we plan to expand didDocument.id's range to did-url in the future).

The create section is clearly inadequate as well.

Agreed.

It is, at best, an 80% solution today.

You're being generous. :)

Shipping specs with known bugs, especially ones as big as this, are a standards anti-pattern.

@dwaite wrote:

IMHO, the problem is that the usage of DID path is not restrictive enough - there isn't currently a way to differentiate a DID URL path that conforms to the described behavior of the method on DID "CRUD" operations.

Can you explain this a bit more, @dwaite?

The resource might instead be hosted content or a service, and may support additional interactions outside the DID resolution definition.

... and a bit more detail on this one as well, please?

gribneau commented 2 years ago

@msporny wrote:

Shipping specs with known bugs, especially ones as big as this, are a standards anti-pattern.

Is that a reference to did:web or the core?

The only required elements for a URI are path and scheme (and even scheme can be omitted for relative references). The path element always exists, and when it is not explicitly included, it exists as a zero length string.

Given that a URI identifies a resource, and given that scheme does not identify a resource, and given that path is the only other required element, I have no idea how we conclude that path can be interpreted as identifying a secondary resource relative to the primary resource.

peacekeeper commented 2 years ago

The best way to address the problem in DID Core, however, is to get folks to admit it's a problem here and then bring that problem back to DID Core (with a fairly simple errata fix to the spec noting that we plan to expand didDocument.id's range to did-url in the future).

I think it would be okay change this, i.e. also allow DID URLs for "id", but I wouldn't go as far as calling it "errata". We did have some discussions about this in the WG, and I think there were also legitimate opinions for now allowing it. Found the following older issues that could be relevent:

jandrieu commented 2 years ago

The best way to address the problem in DID Core, however, is to get folks to admit it's a problem here and then bring that problem back to DID Core (with a fairly simple errata fix to the spec noting that we plan to expand didDocument.id's range to did-url in the future).

I think this notion would fundamentally break the semantic architecture of DIDs.

The DID Document represents the verification relationships (and methods) and service endpoints for an identifier, the DID.

The DID represents the authority part of the RFC3986 standard URL syntax. As such, the DID Document is the metadata that represents assertions by the authority for interacting with that DID, including identifiers within that DID namespace as delineated by DID URLs.

It makes for a straightforward resolution process that parallels DNS quite nicely. You have a DID or DID URL

  1. Resolve the DID (not the DID URL)
  2. Get a DID Document for that DID a. This DID Document's fundamental identifier is a DID, as the DID Document represents the meta-data for that entire DID, not just a singular resource within that DID namesapce.
  3. Interpret the path/query/fragment parts according to the Method to then dereference the full DID URL to arbitrary resources.

If you allow a DID Document's ID to be for a particular DID URL, then how do you look up the DID Document for the DID part of that DID URL?

In other words, if did:ex:abc/resource1 somehow resolves to DID Document A with an id of did:ex:abc/resource1 and did:ex:abc/resource2 resolves to a different DID Document B (with did:ex:abc/resource2 as an ID), then how do we change the above algorithm? Because both DIDs will first resolve to a DID Document for did:ex:abc, whose id MUST be did:ex:abc. You haven't really fixed anything with the insistence that the DID Document's id be able to be a DID URL.

It may be that the way you are hoping to use these path parts is an instance of 'turtles all the way down'. The fact is, the turtles have to stop somewhere. That somewhere is the authority part. That's where the buck stops. That authority part presents the necessary metadata as the DID Document for that authority. It is NOT the metadata for the DID URL resource. Just the metadata for the DID.

The way we (with IIDs) use DID URLs to reference IID Resources and IID References, as defined in the DID Document, allows us to add per-resource meta-data as appropriate. This may be a better way for you to think through your solution. http://w3id.org/earth/identifiers

IMO, it would be a colossal error to violate this separation of responsibilities and allow DID Documents to have IDs that are DID URLs.

Can you describe the use case that requires this feature? What's the value-adding interaction as user would get out of this feature?

msporny commented 2 years ago

Can you describe the use case that requires this feature?

The Web. :)

What's the value-adding interaction as user would get out of this feature?

The ability to serve two different resources from the same authority -- which is what the Web does.

If you allow a DID Document's ID to be for a particular DID URL, then how do you look up the DID Document for the DID part of that DID URL?

You use a resolver and use whatever it gives back to you. Remember, DID Methods are what determine what you get back when you resolve something.

Let's take an example:

RESOLVE did:web:subject.example/people/jane

Plug that into a resolver and you might get a DID Document that looks like this:

{
  "id": "did:web:subject.example/people/jane"
}

That's one subject... but try this and you might get nothing (jane is missing, it's just now a random directory on the Web):

RESOLVE did:web:subject.example/people

... but try this and you might get the authority (aka DNS domain) DID Document:

RESOLVE did:web:subject.example

{
  "id": "did:web:subject.example"
}

DID Core does not allow for that fairly sane thing to happen today... that's the error we made in the DID WG.

jandrieu commented 2 years ago

Can you describe the use case that requires this feature?

The Web. :)

That's not a use case. That's a platform. Which already works great.

We aren't recreating the web. We are creating something different, or at least expanding the web in new directions.

Again, what's the value-added use case? What user does what to get what value?

did:web:subject.example/people/jane

Why on earth would resolving that give you a DID Document with that as the id? It wouldn't. It would give you a DID Document with did:web:subject.example as the ID.

DID URLs do not and never have resolved to DID Documents. They dereference to resources.

Now, I can, as I described in a different answer, define did:web:subject.example/people/jane so that it dereferences to a DID Document, but that DID Document is not the DID Document for that DID URL, because there is no such thing.

DIDs resolve to DID Documents. DID URLs do not.

Full stop.

msporny commented 2 years ago

That's not a use case. That's a platform. Which already works great.

We aren't recreating the web. We are creating something different, or at least expanding the web in new directions.

If I were to accept your interpretation of DID Core, then we have created something that is incompatible with large portions of the Web. :)

At this point, I expect that you haven't actually read the algorithms in the DID Web Method spec... knowing you (at some level), I expect you'd be just as confused as I am if you were to read the text in the method specific id and the Read section of the spec. You would probably see that the method specific id is defined incorrectly (as it allows two different path-abempty definitions to happen, which conflicts with DID Core in a way that cannot be round-tripped). You would probably also understand that the Read section uses an algorithm that's not round-trippable when coupled with RFC 3986. There are just factual errors there. Can you please at least confirm those two things so I know what we're at least on the same page wrt. the technical issues with the current did:web specification?

Again, what's the value-added use case? What user does what to get what value?

The ability to publish multiple DID Documents on a single DNS domain without using this weird/broken colon-path syntax that did:web uses (that is clearly broken and not round-trippable in the spec, per the comments above).

did:web:subject.example/people/jane

Why on earth would resolving that give you a DID Document with that as the id? It wouldn't. It would give you a DID Document with did:web:subject.example as the ID.

Weird, I have never thought that that's where we were going with DID Core. :P

DID URLs do not and never have resolved to DID Documents.

Where does DID Core state that DID URLs can never resolve to DID Documents?

They dereference to resources.

... and those resources might be DID Documents themselves. :)

Now, I can, as I described in a different answer, define did:web:subject.example/people/jane so that it dereferences to a DID Document, but that DID Document is not the DID Document for that DID URL, because there is no such thing.

DIDs resolve to DID Documents. DID URLs do not.

Full stop.

Citation required. :)

Here are the citations that back up my point, which is that a DID URL can be resolved to a DID Document, like Section 7.2 DID URL Dereferencing:

contentStream ... The contentStream MAY be a resource such as a DID document that is serializable in one of the conformant representations, a Verification Method, a service, or any other resource format that can be identified via a Media Type and obtained through the resolution process.

So, the DID Core spec is either internally inconsistent or tragically limiting -- the "tragically limiting" perspective says that you can use a DID URL to get a DID Document, but when you get that DID Document, the identifier isn't going to be for the resource you fetched!

gribneau commented 2 years ago

Interestingly, there is a path forward here without changing anything.

5.1.1 requires the DID Subject to conform with 3.1, which in turn asserts that RFC3986 controls.

RFC3986 3.3 provides that:

A path is always defined for a URI, though the defined path may be empty (zero length).

It seems, then, that these are equivalent:

did:example:123456789abcdefghijk

did:example:123456789abcdefghijk/
jandrieu commented 2 years ago

That's not a use case. That's a platform. Which already works great. We aren't recreating the web. We are creating something different, or at least expanding the web in new directions.

If I were to accept your interpretation of DID Core, then we have created something that is incompatible with large portions of the Web. :)

Unfortunately, that's a hyperbolic and disingenuous response. On the one hand, of course DIDs are incompatible with large portions of the web: not a single browser supports them. On the other hand, you provide no explanation of what this means. It is an empty attack without foundation. What parts of the web are now broken?

At this point, I expect that you haven't actually read the algorithms in the DID Web Method spec... knowing you (at some level), I expect you'd be just as confused as I am if you were to read the text in the method specific id and the Read section of the spec. You would probably see that the method specific id is defined incorrectly (as it allows two different path-abempty definitions to happen, which conflicts with DID Core in a way that cannot be round-tripped). You would probably also understand that the Read section uses an algorithm that's not round-trippable when coupled with RFC 3986. There are just factual errors there. Can you please at least confirm those two things so I know what we're at least on the same page wrt. the technical issues with the current did:web specification?

That's funny. I would expect that opposite in that none of the examples you've used use the current syntax for did:web. There are not two different path-abempty definitions. There is a path that is encoded into the method-specific-id and the path part of the DID URL itself. Note that the did:web spec defines NO path-abempty. In fact, it provides no ABNF whatsoever. So, your confusion is understandable, but it's not a problem in did-core, it's just a gap in did:web.

I understand the round-trip problems Orie is attempting to fix and to my initial analysis, he is correct that %encoding colons is the simple fix.

None of that has anything to do with did-core. Yes. We should fix did:web. But did:core has a particular and distinct differentiation between DIDs and DID URLs.

You may recall that I warned you, @talltree, and @peacekeeper that the term DID URL is going to confuse people. People will see DID URLs and expect them to be DIDs. I don't have a better term for DID URLs, but I believe your argument is an excellent example of the problem I raised back then: even an editor of the DID Core Specification is confusing the two.

Again, what's the value-added use case? What user does what to get what value?

The ability to publish multiple DID Documents on a single DNS domain without using this weird/broken colon-path syntax that did:web uses (that is clearly broken and not round-trippable in the spec, per the comments above).

I'm sorry, but that still isn't a use case. It's just a broken round-trip algorithm for a particular DID method. Once did:web fixes it with encoding, we are good to go. You may be frustrated to have to encoded your colons in did:web, but that's no more relevant than the frustration I've had debugging web apps and having to figure out when to use URL encoding and when not to, especially when the different parts of URLs have different encoding rules. It's sometimes complicated. But "not doing the rigorous thing you need to do to make it work" is not itself a use case.

If you percent encode your colons, did:web works just fine.

did:web:subject.example/people/jane

Why on earth would resolving that give you a DID Document with that as the id? It wouldn't. It would give you a DID Document with did:web:subject.example as the ID.

Weird, I have never thought that that's where we were going with DID Core. :P

Weird, I would have thought you understood DID Core.

DID URLs do not and never have resolved to DID Documents.

Where does DID Core state that DID URLs can never resolve to DID Documents?

DID Core never states that DID URLs resolve to anything.

They dereference to resources.

... and those resources might be DID Documents themselves. :)

Yes. But they are not the DID documents of the DID in the DID URL. They could refer to anything, any resource. But that basically doesn't mean anything in this context. What we care about is the DID document that is returned from resolution of a DID.

There is no resolution defined for a DID URL.

Now, I can, as I described in a different answer, define did:web:subject.example/people/jane so that it dereferences to a DID Document, but that DID Document is not the DID Document for that DID URL, because there is no such thing. DIDs resolve to DID Documents. DID URLs do not. Full stop.

Citation required. :)

Here you go: In the Section 1.3 Architecture Overview https://www.w3.org/TR/did-core/#architecture-overview

DIDs are resolvable to DID documents. A DID URL extends the syntax of a basic DID to incorporate other standard URI components such as path, query, and fragment in order to locate a particular resource—for example, a cryptographic public key inside a DID document, or a resource external to the DID document.

Note that DIDS are resolvable. In contrast, DID URLs locate particular resources.

Other statements about DID URLs:

DID URL dereferencers and DID URL dereferencing A DID URL dereferencer is a system component that takes a DID URL as input and produces a resource as output. This process is called DID URL dereferencing. The process of DID URL dereferencing is elaborated upon in § 7.2 DID URL Dereferencing.

DID fragment The portion of a DID URL that follows the first hash sign character (#). DID fragment syntax is identical to URI fragment syntax.

DID path The portion of a DID URL that begins with and includes the first forward slash (/) character and ends with either a question mark (?) character, a fragment hash sign (#) character, or the end of the DID URL. DID path syntax is identical to URI path syntax. See § Path.

DID query The portion of a DID URL that follows and includes the first question mark character (?). DID query syntax is identical to URI query syntax. See § Query.

Note that fragment, path, and query are ONLY defined as part of the DID URL. Not part of the DID.

DID URL dereferencing The process that takes as its input a DID URL and a set of input metadata, and returns a resource. This resource might be a DID document plus additional metadata, a secondary resource contained within the DID document, or a resource entirely external to the DID document. The process uses DID resolution to fetch a DID document indicated by the DID contained within the DID URL. The dereferencing process can then perform additional processing on the DID document to return the dereferenced resource indicated by the DID URL. The inputs and outputs of this process are defined in § 7.2 DID URL Dereferencing.

Section 3.2 DID URL syntax https://www.w3.org/TR/did-core/#did-url-syntax

A DID URL is a network location identifier for a specific resource. It can be used to retrieve things like representations of DID subjects, verification methods, services, specific parts of a DID document, or other resources.

Section 3.2.1 DID Parameters

Adding a DID parameter to a DID URL means that the parameter becomes part of the identifier for a resource.

Note that the parameter affects the resource identifier, the DID URL, not the DID.

TL;DR:

After an exhaustive search through the spec, only DIDs "resolve". DID URLs are "dereferenced".

Here are the citations that back up my point, which is that a DID URL can be resolved to a DID Document, like Section 7.2 DID URL Dereferencing:

This is not a statement about resolution. This is a statement about dereferencing. I think we are all agreed that a DID URL dereferences to a specific resource. It doesn't resolve to that resource. Rather, the DID part of the DID URL is resolved to a DID Document which can then be used to dereference to the actual resource:

This process depends on DID resolution of the DID contained in the DID URL.

It's the DID that is resolved. Not the DID URL

contentStream ... The contentStream MAY be a resource such as a DID document that is serializable in one of the conformant representations, a Verification Method, a service, or any other resource format that can be identified via a Media Type and obtained through the resolution process.

So, the DID Core spec is either internally inconsistent or tragically limiting -- the "tragically limiting" perspective says that you can use a DID URL to get a DID Document, but when you get that DID Document, the identifier isn't going to be for the resource you fetched!

It is neither.

The DID Core spec is exceptionally consistent on this issue. It's the unfortunately conflation of DIDs & DID URLs on the one hand and resolving & dereferencing on the other. Fortunately, the specification is consistent on this, but it is, understandably, a challenge to keep all of this straight.

To return to your initial example did:web:subject.example/people/jane, I expect that you likely mis-generated the did:web DID because you are conflating the path encoded in the method-specific-id of did:web with the path part in the DID URL.

That DID URL has the following parts

{
  scheme : "did",
  method : "web",
  method-specific-id : "subject.example",
  path : "/people/jane"
}

The DID for this DID URL is did:web:subject.example

Which will resolve to a DID document at https://subject.example/.well-known/did.json.

However, what I think you probably wanted to do was to resolve to a DID document at https://subject.example/people/jane/did.json. To achieve that result, the DID would be did:web:subject.example:people:jane

See Example 4 https://w3c-ccg.github.io/did-method-web/#example-creating-the-did-with-optional-path as well as Section 2.5.4 Optional Path Considerations https://w3c-ccg.github.io/did-method-web/#optional-path-considerations

The resource referred to by did:web:subject.example/people/jane is ambiguous. Hence my earlier comment that maybe you aren't familiar with the current DID generation algorithm in did:web. The did:web specification is completely silent on how to interpret the path part in a did:web DID URL. It does state clearly how to decode the method-specific-id to get a path for retrieving the DID Document, but is completely silent on how a path in the DID URL should be interpreted.

My advocacy is to use linkedResource property from IIDs and did:cosmos. Paths in did:cosmos DID URLs refer to resources defined in a linkedResource section. However, this is an exceptionally new property. I've been hoping to get a demonstrable implementation in place before adding it to the DID Spec Registries, but it is in use in the IID spec and did:cosmos and I think the IID Reference and IID Resource approach taken by the IID spec is a superior pattern for avoiding the type of confusion this Github issue has illuminated.

It is also worth noting that colons are already escaped in the did:web method-specific-id for port specification in the authority part of the encoded web URL. See Example 5 https://w3c-ccg.github.io/did-method-web/#example-creating-the-did-with-optional-path-and-port

So, Orie's solution is minimal, effective, and in-line with existing processes for other restricted characters. It's unfortunate that the percent encoding was restricted to the colon used for port specification, but it's an easy fix. Which this PR does.

No changes needed to did-core. Just some upskilling on the distinctions between DIDs / DID URLs and resolving / dereferencing.

OR13 commented 2 years ago

Please move "changes to resolution discussions" to issues, and keep this PR focused on adding percent encoding.

gribneau commented 2 years ago

Is this moving forward?

The reason we need to interpolate the path into the method specific identifier with colons, which then requires percent-encoding in some cases, is because core violates RFC3986 by asserting that a URI path does cannot be used to identify the primary resource.

We would be better off recognizing that as errata and leaving the handling of path under the control of the method, including the decision of whether to use it to identify the primary resource.

OR13 commented 2 years ago

@gribneau This PR has not moved forward... thats why I am trying to help it along.

Lets not conflate rule change to percent encoding of port with rule changes for percent encoding of path...

Let's continue to discuss the path issue, on a separate issue, so we can reduce the complexity of this PR (and a subsequent PR that might be raised for path).

I suggest you comment on DID Core repo regarding errata for that spec, feel free to cross link here.

dmitrizagidulin commented 2 years ago

Looking at the the amount of pushback to this PR (with regards to percent-decoding the path part), and the fact that it's addressing a very niche case (not actually a problem, in other words), I'd like to close it. We can continue the path discussion in issue https://github.com/w3c-ccg/did-method-web/issues/52