w3c-ccg / did-method-web

DRAFT: did:web Decentralized Identifier Method Specification
https://w3c-ccg.github.io/did-method-web/
Other
32 stars 17 forks source link

How to handle hostnames with ports? #7

Open dmitrizagidulin opened 4 years ago

dmitrizagidulin commented 4 years ago

Now that we support paths (as of PR #5) in our did:web URLs, and we're using the : to encode the / characters for paths, this poses another challenge.

Since we're using the : character for paths, what do we do about the actual most common intended purpose of that character, which is to specify a port number?

So, specifically, say I am a developer who has just fired up a test server on their local machine, which is running on https://localhost:8443 (the https is deliberate of course - I made a self-signed cert for it and everything). This kind of thing happens all the time (it's happening to me right now :) ).

If there's a did:web document residing on that domain (say in /.well-known/did.json), what will that URL look like? According to our rules so far:

did:web:localhost:8443

Except now we're using : (in the did-specific-identifier portion of the url) to encode path fragments. So that URL would "decode" to https://localhost/8443. Not what we want.

So, what are people's thoughts on how to best handle this? @awoie, @OR13 ?

OR13 commented 4 years ago

retry logic I suppose.

dmitrizagidulin commented 4 years ago

The other option we have is - we can require the hostname portion to be URL-encoded. So, https://localhost:8443 would encode as did:web:localhost%3A8443.

OR13 commented 4 years ago

^ much better idea.

felixwatts commented 4 years ago

Do we have a timeline for a resolution of this issue? As far as I can see this prevents actual adoption in a real life scenario.

OR13 commented 4 years ago

@felixwatts you could just use a URL instead of a DID, and return a did document with the same URL everywhere the DID would be.

dmitrizagidulin commented 3 years ago

@felixwatts - apologies, I did the thing where I implemented the proposed solution and thought I updated the spec but didn't. Will be making a PR shortly.

llorllale commented 3 years ago

@dmitrizagidulin is your PR arriving any time soon? Could you explain the proposed solution?

llorllale commented 3 years ago

I guess the proposed solution is base64url-encoding the host portion as per https://github.com/w3c-ccg/did-method-web/issues/7#issuecomment-623485232 and what @felixwatts did.

llorllale commented 3 years ago

We are going to adopt the base64url-encoding proposal in aries-framework-go.

dmitrizagidulin commented 3 years ago

@llorllale sure. I'll make the PR today.

The proposed solution is - URL-encoding (as in, encodeUriComponent) both the host portion, and each path portion. So, the test vectors would be:

  1. did:web:localhost%3A8080 -> https://localhost:8080/.well-known/did.json
  2. https://example.com/path/some+subpath -> did:web:example.com:path:some%2Bsubpath
dmitrizagidulin commented 3 years ago

@llorllale -1 to base64url-encoding did:web URLs, though. (Since base64url-encoding removes one of the nice properties of did:web DIDs, which is, readability / recognition of the domain name.)

So for example, https://localhost:8080 would base64url-encode as did:web:bG9jYWxob3N0OjgwODA=, which is opaque to human eyes.

llorllale commented 3 years ago

@dmitrizagidulin

@llorllale -1 to base64url-encoding did:web URLs, though. (Since base64url-encoding removes one of the nice properties of did:web DIDs, which is, readability / recognition of the domain name.)

So for example, https://localhost:8080 would base64url-encode as did:web:bG9jYWxob3N0OjgwODA=, which is opaque to human eyes.

Fully agree - I mixed them up this morning before coffee somehow.

llorllale commented 3 years ago

@llorllale sure. I'll make the PR today.

The proposed solution is - URL-encoding (as in, encodeUriComponent) both the host portion, and each path portion. So, the test vectors would be:

1. `did:web:localhost%3A8080`  `->` `https://localhost:8080/.well-known/did.json`

2. `https://example.com/path/some+subpath` `->` `did:web:example.com:path:some%2Bsubpath`

+1

llorllale commented 3 years ago

@dmitrizagidulin @OR13 I just realized url-encoding the path components results in a non-compliant DID as per the syntax: https://www.w3.org/TR/did-core/#did-syntax

llorllale commented 3 years ago

Nevermind: https://tools.ietf.org/html/rfc3986#section-2.4

So in summary, if I understand it right:

llorllale commented 3 years ago

URI recommendations: https://www.w3.org/Addressing/URL/4_URI_Recommentations.html

The percent sign ("%", ASCII 25 hex) is used as the escape character in the encoding scheme and is never allowed for anything else.

Some test vectors for percent-encoding: https://www.w3.org/2004/04/uri-rel-test.html#reg-percent

sk91 commented 3 years ago

@llorllale Hi,

I just realized url-encoding the path components results in a non-compliant DID as per the syntax: https://www.w3.org/TR/did-core/#did-syntax

Nevermind: https://tools.ietf.org/html/rfc3986#section-2.4

can you clarify please if we should incorporate encodeUrI into implementation or not?

OR13 commented 3 years ago

It sounds like did web does not currently support encodeUrI or ports.... and folks should assume that remains true until this issue is closed after the spec is updated.

sk91 commented 3 years ago

yeh, incorporating ngrok to use did:web in development. Seems like the easiest way to be compliant in development

OR13 commented 3 years ago

hah, nice i <3 ngrok.... thats an awesome idea.

mirceanis commented 3 years ago

I'm not sure if this should be a separate issue or not, but it's related to encoding 😅. The spec does not mention how to deal with non-ASCII domain names.

punycode is an option for that, but it won't cover the port issue and is not as easily available as encodeUriComponent. However, the did-core spec does not allow the % character in the method-specific-id which is currently a blocker for encodeUriComponent

OR13 commented 3 years ago

@mirceanis non ascii domain names cannot be DIDs.... if the spec needs to be updated to support them we need URL safe bidirectional transformations.

dmitrizagidulin commented 3 years ago

Update: The DID Core spec is being updated to accept % characters as part of the DID URI ABNF, as of PR https://github.com/w3c/did-core/pull/703. So, we'll go with percent url encoding, since that is now valid.

kdenhartog commented 3 years ago

Updated the text for this in #38

letmaik commented 2 years ago

https://example.com/foo:bar is also an interesting case, since : is a valid path character in URIs. Would that become did:web:example.com:foo%3Abar? What if a URL already has encoded characters, are they decoded first and then re-encoded per path segment? For example, https://example.com/foo%3Abar -> https://example.com/foo:bar -> did:web:example.com:foo%3Abar. In this case, what URL does that DID refer to? https://example.com/foo:bar or https://example.com/foo%3Abar?

dmitrizagidulin commented 2 years ago

@letmaik - good point, about : characters being allowed in the path segment of URLs. I'll update the proposal with that in mind.

OR13 commented 2 years ago

imo, did:web:example.com/foo:bar -> https://example.com/foo:bar/did.json... no problem here.

letmaik commented 2 years ago

imo, did:web:example.com/foo:bar -> https://example.com/foo:bar/did.json... no problem here.

That's a DID URL, not a DID. I think this has to be solved for the DID itself, right?

OR13 commented 2 years ago

yes, that example was a did url with a path that contained a colon.

today:

The method specific identifier MUST match the common name used in the SSL/TLS certificate, and it MUST NOT include IP addresses or port numbers.

if we wanted to allow the identifier to use ports:

did:web:localhost%3A3000 -> https://localhost:3000/did.json

and for completness, here is a did url that uses ports and colons in paths:

did:web:example.com%3A1337/foo:bar -> https://example.com:1337/foo:bar/did.json

letmaik commented 2 years ago

yes, that example was a did url with a path that contained a colon.

today:

The method specific identifier MUST match the common name used in the SSL/TLS certificate, and it MUST NOT include IP addresses or port numbers.

But what follows after that is important:

Directories and subdirectories MAY optionally be included, delimited by colons rather than slashes.

This applies to DIDs, not just DID URLs. Or are you suggesting to remove that part?

dmitrizagidulin commented 2 years ago

@OR13

imo, did:web:example.com/foo:bar -> https://example.com/foo:bar/did.json... no problem here.

The issue mentioned is in the other direction:

https://example.com/foo:bar/ to did:web:example.com:foo%3Abar

But that should still be covered by the overall algorithm (that requires percent-encoding of path fragments when encoding from URL to DID).

OR13 commented 2 years ago

ahh yes, thanks for clarifying!

gribneau commented 2 years ago

Thinking in terms of the changes for #42 , the percent encoding would work like this:

did:web:example.com%3A1337:foo --> https://example.com:1337/foo/

and percent encoding in the path as well would work like this:

did:web:example.com%3A1337:foo%3Abar --> https://example.com:1337/foo:bar/

Is that correct?

dmitrizagidulin commented 2 years ago

@gribneau -- that looks right, +1

OR13 commented 2 years ago

^ yes, i think thats the case we needed

letmaik commented 2 years ago

and percent encoding in the path as well would work like this:

did:web:example.com%3A1337:foo%3Abar --> https://example.com:1337/foo:bar/

Is that correct?

What about IDNs? For the domain räksmörgås.josefsson.org how would the did:web look like? Are the unicode characters percent encoded? I guess so. How does resolve work then? Since this is an IDN and the resolve algo assumes URLs, not IRIs, it would have to undo the percent encoding and then convert via punycode to a URL: https://xn--rksmrgs-5wao1o.josefsson.org. I think this complicates things too much and I would probably just simplify this to url-decoding each segment in full and putting the parts together to form an IRI. So, did:web:r%C3%A4ksm%C3%B6rg%C3%A5s.josefsson.org%3A1337:r%C3%A4ks becomes https://räksmörgås.josefsson.org:1337/räks (plus /did.json appended). If you need a URI then that's an application concern and most libraries accept UTF-8 encoded IRIs anyway these days. Doing it this way also covers the port automatically and doesn't have to be mentioned specifically in the spec.

dmitrizagidulin commented 2 years ago

@letmaik - Great point. I'm reopening this issue as a reminder for us to address this in the next PR.