Service endpoint identification matrix parameter

davidlehn commented 6 years ago

The endpoint identification format of did:example:123456789abcdefghi;openid was introduced in https://github.com/w3c-ccg/did-spec/pull/75. The ; usage may cause URI parsers to expect matrix parameters. I think matrix parameters are still a proposal [1] vs an actual spec. It does look like the somewhat rare usage in the wild uses key=value pairs [2][3][... many search results ...]. In the 1996 proposal there is an example that considers a relative matrix param that's just a key as something that blanks out that key. It may make sense to use key=value syntax in dids to align with other usage. Could be as simple as service=openid.

https://github.com/w3c-ccg/did-spec/issues/72 mentions the need to describe the ; syntax, but that has not been addressed.

[1] https://www.w3.org/DesignIssues/MatrixURIs.html [2] https://stackoverflow.com/questions/401981/when-to-use-query-parameters-versus-matrix-parameters [3] https://github.com/medialize/URI.js/issues/181

Fak3 commented 6 years ago

Couldn't find a rationale for the need of semicolon syntax in #72. Can the ;openid part be replaced with just fragment: #service-openid ?

msporny commented 6 years ago

Could be as simple as service=openid

This is what I had counter-proposed, but folks pushed back for reasons I can't remember anymore. I think the push back was that it would be impossible to tell when the DID-based URL ends and the HTTP-based URL begins.

@talltree @mikelodder7 @peacekeeper -- any thoughts on this?

msporny commented 6 years ago

Can the ;openid part be replaced with just fragment: #service-openid

In that scenario, how do you know when the DID-based URL ends and the HTTP-based URL begins. For example:

did:example:234786;vcrepository=/foo/bar/public-credentials.jsonld

Would you do something like this?

did:example:234786?service=vcrepository&servicepath=/foo/bar/public-credentials.jsonld

If yes, then what happens if there are two path parameters... one that's consumed by the DID Service Resolver and another that's meant to be consumed by the service endpoint?

Fak3 commented 6 years ago

In that scenario, how do you know when the DID-based URL ends and the HTTP-based URL begins.

Why someone needs to construct URL like this? Embedding one URL into another

Fak3 commented 6 years ago

isn't "serviceEndpoint" property supposed to carry the url of the service? Why put it in the "@id"?

dmitrizagidulin commented 6 years ago

Have we come to any consensus on this issue? (Would be helpful to have this resolved, so we can validate service endpoint IDs in our code.)

mikelodder7 commented 6 years ago

I have an open PR that I thought addressed this 95 or am I misunderstanding something?

kimdhamilton commented 6 years ago

Hey @mikelodder7, it looks like @msporny had some open questions on the PR.

dmitrizagidulin commented 6 years ago

Context

So, here's the context for this conversation, for those joining in.

The value proposition for DIDs is that they provide stable user-controlled identifiers that can help with two main things: management of cryptographic materials, and portability of service endpoints.

With crypto material management, the idea is that a user's identifier can stay the same while underneath, keys and the like can change -- be added, rotated, revoked, and so on.

Similarly, with portability of service endpoints, the idea is that some sort of DID-based URI stays the same for a given service, and meanwhile the user can migrate from one service provider to another (and that stable DID-based URI for a service does not have to change).

So for a trivial example, say that a user has a service that stores their user profile picture. And currently, they're storing it at: https://facebook.com/img/userpic.jpg (not a real url, btw). The moment that they want to change providers for that service, all of their existing URLs (for example, those stored in contacts lists) that point to that userpic break.

What you want to be able to do instead, is to use a service URI based on a DID, so the URL above now becomes something like:

<some DID-based URI for service of type #user_pictures>/img/userpic.jpg

And then inside that DID Document, in the service: section, the user can swap out the actual endpoint for a particular service, without changing the service URI.

So for example, if Alice's DID was did:example:123, and she started out with the following DID Doc section:

"service": [
  {
    "id": "did:example:123#user_pictures",
    "serviceEndpoint": "https://facebook.com"
  }
]

The userpic URI would be something like <something based on did:example:123><service with ID #user_pictures>/img/userpic.jpg.

And when this URI would be passed to a DID Resolver, it would automatically be resolved to: https://facebook.com/img/userpic.jpg (because the resolver would fetch the DID did:example:123, look in the service section, find the service with the id #user_pictures, and look at its serviceEndpoint property).

And later, when she migrated to another userpic service, the contents of the DID Document would change to:

"service": [
  {
    "id": "#user_pictures",  // <- still the same as previously
    "serviceEndpoint": "https://new-friendster.com"
  }
]

The picture's URI would remain the same, <something based on did:example:123><service with ID #user_pictures>/img/userpic.jpg

But now, a resolver would translate that URI to: https://new-friendster.com/img/userpic.jpg

So that's the general idea - to enable DID-aware apps to use stable URIs for well known service types, and those URIs would remain the same even if the user migrated to a different service.

(And yes, this whole setup does depend on the path part of the URI staying the same from service to service. But there's actually a surprising amount of use cases where that would be true.)

The Arguments So Far

The argument is about two things:

How do we form the ids of each entry in the DID Doc's service section (this is actually issue #97), and
How do we define the structure of the service URIs (so that we can pass them to resolvers, so they can translate those URIs to specific service endpoints) (that's this issue).

Note that these are two separate topics. Let's examine the second one, the format of service URIs, because I will argue that the solution to the first issue should remain the same, regardless of what we decide.

Service URI Format

Each proposed approach needs to address two main things:

How do you signal to the resolver that this is a Service URI? (Meaning, how difficult will it be for a DID Resolver to tell that a given URL is a service url (and should be handed off to a different code path)?)
How do you specify id or type and the service path? (Given that the Service URI was parsed, the DID part separated from the service part.)

Option 1 - Semicolon based service URIs (current consensus)

The group consensus so far is to use one of the reserved URI delimiter characters, ; (the semicolon), to separate the DID part from the rest of the service URI. So, the structure is: <did>;<service id and path>.

Benefits:

A semicolon is a valid reserved character (a sub-delimiter) in a URI (see Section 2.2 of the URI RFC), specifically created to separate different components.
There is precedent for it -- semicolons are used as separators both in the Data URI Scheme and in the ni:/RFC6920 scheme
It would be fairly easy for a DID Resolver library to test for a presence of a ; character (since unlike with HTTP URLs, only ? and # chars are allowed in the DID scheme).

Downsides/Implications:

Using a ; as a separator between a DID and a service component means that it should be reserved (made an illegal character) in all Specific DID Method Schemes
As @davidlehn points out in the original description of this issue, semicolons are also used as "Matrix parameters" in existing URL paths. That is, they're a valid alternative to & in query params, to delimit key/value combinations. However, they are not a valid delimiter in the DID URI scheme, so this point might be moot.

Open Questions (Stuff that's being argued):

If the ; separates the DID part from the Service part, how do we specify the format for Service ID or the Service Type and the Service Path.

Option 1a: - service or type keyword, then Path (with no separator)

(This is the option currently proposed in PR #95 by @mikelodder7)

For example: did:example:123;service=user_pictures/img/userpic.jpg or did:example:123;type=UserPicService/img/userpic.jpg

The implications (at first glance) with this option is that the Resolver:

Splits on ;, first item is the DID, the rest is the Service component
Parses the Service component up to the first / - this is a key/value pair denoting the Service Locator (an id or a type)
Starting with the first /, everything that follows is part of the service path

Except here's the problem. Although the the examples in the spec and the PR give a single camel-cased string as a type (like UserPicService), in reality it's actually going to be a linked data url. So it'll actually be something like type=https://schema.org/UserPicService. So the full URI will be:

did:example:123;type=https://schema.org/UserPicService/img/userpic.jpg

Notice what that does to the parsing algorithm. It's no longer possible to just parse up to the first / and assume that's the service locator key/value pair, and it becomes almost impossible to tell where the locator ends and path begins.

So the actual Option 1a would require URL-encoding the service locator, like this:

did:example:123;type=urlEncode(https://schema.org/UserPicService)/img/userpic.jpg

Benefits:

Clear whether the service should be looked up by id or type

Drawbacks:

No separator between service and path, so you have to parse up to the first / (and make sure to URL-encode the locator)
Slightly wordier than alternatives below

Option 1b: - Service or Type directly, no keyword, then Path (no separator)

Example: did:example:123;user_pictures/img/userpic.jpg or did:example:123;urlEncode(https://schema.org/UserPicService)/img/userpic.jpg

The implication with this option is that the Resolver:

Splits on ;, first item is the DID, the rest is the Service component
Parses the Service component up to the first / (which means, slashes in service IDs or types are illegal) - this is serviceIdOrType
Starting with the first /, everything that follows is part of the service path
Resolver fetches the DID Doc, and first tries the parsed serviceIdOrType as a service id, and failing that, tries it as a service type. Because the set of services is un-ordered, in case there are multiple services for a given type, the resolver would return whichever one it got to first.

Option 1c: - Service or Type directly, no keyword, : separator, then Path

This is same as 1b, but uses a : (colon) to separate the service id or type and the service path, scp style.

Example: did:example:123;user_pictures:/img/userpic.jpg or did:example:123;urlEncode(https://schema.org/UserPicService):/img/userpic.jpg

The idea being, this makes it slightly easier to parse and separate service locator and path. Except just like with the previous options, you still end up having to URL-encode the service type (because of the colon in the http URL).

Option 1d: - Service or Type directly, no keyword, ; separator, then Path

If we switch to re-using the ; (semicolon) separator, we can now get away with not URL-encoding the service type:

did:example:123;https://schema.org/UserPicService;/img/userpic.jpg

This makes the parsing easier - split on ;, first item is DID, second is the Service Locator, and the rest are part of the service path.

Option 2 - Query Params or Hash Fragment Query Params (considered and rejected)

As @Fak3 and others have asked, why not specify the service id/type and the service path as either query parameters or hash fragment query parameters? In other words, why not do:

<did with path>?service=<service id>&path=<service path>&...<all the other DID query params>

or

<did>#service=<service id>&path=<service path>&... (similar to how OAuth2 Implicit flow passes back the token and state and so on, in the callback redirect).

Reasons not to go with this approach:

Not as easy for a DID Resolver to tell that this is a Service URI -- it would have to first parse the DID and then examine the query params for reserved keywords like service (which would not be allowed to be used in any other context), and only then pass it on to the code path that handled resolving service URIs.
More importantly, this approach goes against the URI spec. Query parameters (see Section 3.4 of the URI spec) are scoped to a particular scheme and naming authority. In other words, query parameters are only meaningful to a given server, and have no universal meaning across different URIs (even within the same URI scheme). Similarly, with hash fragments: fragment's format and resolution is dependent on the media type and Fragment identifier semantics are independent of the URI scheme and thus cannot be redefined by scheme specifications.

Option 3 - Magnet URI style (not yet considered?)

I'm not sure if this has already been discussed, but @cwebber brought up the possibility of using Magnet URIs for this (and other) use cases.

Magnet URIs basically consist of nothing but query parameters. And, unlike regular URIs, it is expected that query params will be registered as well-known keywords. So for example, the xt query param always stands for 'Exact Topic' (the hash of the target file).

So our Service URI would be something like:

magnet:?as=did:example:123&service_id=user_pictures&service_path=/img/userpic.jpg

This option would require us to define and reserve the query parameters (like service_id, service_type, service_path) and their semantics.

Summary

So, we still need to determine which of these options we should take. If we go with the current consensus (; as separator between DID and service component), we need to decide whether to use a separator between the service locator and service path, and decide whether the service locator should be key/value style (type=<ServiceType>) or not.

My personal vote is either Option 1d, or Option 3.

Fak3 commented 6 years ago

Not as easy for a DID Resolver to tell that this is a Service URI

There must be some unvoiced requirement that i am not aware of, which is a cornerstone of how the URI would look like. So I am wondering what is that. Are you saying that DID resolver must tell if the did-based url is a service url without resolving that url to a document and looking at the contents? Why is that? Why this requirement does not apply to the key material urls like did:123#myRsaKey1?

peacekeeper commented 6 years ago

Great summary, I think Option 1b is what we've been assuming for a while; but others have advantages too. Maybe we should revisit this topic on an upcoming CCG call. My preference would be either Option 1a or 1b.

msporny commented 6 years ago

Hmm, this is missing an option 1e that we also discussed (I think), which I'll ask @dlongley to elaborate upon since he has the details.

dlongley commented 6 years ago

@msporny, actually the "option 1e" you mention would fall under "option 1b" (just imagine the ID for the service that gets url encoded has a # char in it), so I think everything is covered here.

dlongley commented 6 years ago

I also want to mention that we should specify a "proxy" (full name TBD) service type to help assist with potential GDPR issues (we don't know yet what the issues will be but we should plan ahead).

A DID resolver that sees this "proxy" service type in a DID document could follow its service endpoint to resolve to a final URL. This would allow a non-PII id and serviceEndpoint to be added to a DID Document where the endpoint points at an external service that manages endpoint mappings in a way that enables deletion.

dlongley commented 6 years ago

@msporny, actually maybe "option 1e" could still be a bit different from 1b. Here's "option 1e":

Consider this service entry in a DID Document:

"service": [
  {
    "id": "did:example:123#user_pictures",
    "serviceEndpoint": "https://new-friendster.com"
  }
]

Remember that here, if we want to return the information from the DID Document graph for the service itself, we look up did:example:1234#user_pictures, just like we do with finding, for example, a key in the DID Document graph.

So, now if we want to add the DID resolution feature, the way it works is we change the hash # to a semicolon ;, like this:

did:example:123#user_pictures => did:example:123;user_pictures

So, to resolve a full path we would do...

Example: did:example:123;user_pictures/img/userpic.jpg or did:example:123;urlEncode(user_pictures)/img/userpic.jpg

In other words:

Example: did:example:123;<part after hash>/img/userpic.jpg or did:example:123;urlEncode(<part after hash>)/img/userpic.jpg

This option could also be mixed with some of the other options to use an extra colon : or semicolon ; should we go that route for easier parsing or whatever.

mrinalwadhwa commented 6 years ago

among 1b,1c and 1d ... 1d feels superior by not requiring the urlEncode.

The key benefit of 1a:

Clear whether the service should be looked up by id or type

This could be achieved in 1d as well:

did:example:123;type=https://schema.org/UserPicService;/img/userpic.jpg

SmithSamuelM commented 5 years ago

I think #85 fixes this. Simply

The presence of the DID query has two main syntactical interpretations.

When a DID path component is present the DID query is a modifier on the did path resource in the same way that it works on a URL.

When a DID path component is not present then the DID Query is a modifier on the Did meta-data obtained from the DDO and narrowed by the did fragment if present.

In the latter case above the did query parameters operate on the meta data in the DDO so having a reserved query parameters that correspond to metadata keys such as “service” is unambiguous and does not impede the use of query parameters in the former case above

SmithSamuelM commented 5 years ago

I think that we should accept #85 as did query was left out as an oversight and any discussion on service endpoint should be with the assumption that did query is an existing constraint. It would be bad to not support query in general otherwise much of the tooling that makes the url like syntax of the did so convenient gets thrown away. The semantic switch based on the presence of the path syntax is a simple way of assigning what the query is modifying. Adding semantics to metadata query is then decoupled from generic querysemantics

SmithSamuelM commented 5 years ago

An even cleaner approach is to use JSON PTR #86 to narrow the context. #86 uses the same semantic switch, that is, If the DID Path is present the DID Fragment JSON PTR identifies an element of the resource identified by the path. IF the DID Path is NOT present the DID fragment JSON PTR identifies an element of the metadata in the DDO. Combining the two gives the cleanest way of specifying operations without encumbering the query parameters (ie reserved query key names).

IF the DID Path is not present and the DID Fragment is present and is a JSON PTR then the DID Query operates on the DDO element identified by the DID Fragment JSON PTR. In this narrowed context then the Query parameters can have specified meaning without encumbering their meaning in any other context. Because we have already reserved certain element names for the DDO, reserving DID Query param names for those same reserved DDO elements is the minimally encumbering approach.

This way we keep the URL syntax clean (ie do not specially redefine ";") We just use existing syntactical elements in an unambiguous way.

From a DID resolver parsing logic perspective:

If the DID Path is present the method, idstring and any service endpoints are resolved in a default metadata context. The remainder of the DID (path, query, fragment) is applied to the resulting resolved resource not the metadata

If the DID Path is NOT present the method, idstring, and any service endpoints are resolved as modified by the query and fragment with json ptr in the fragment allowing high specificity of which metadata element(s) are to be modified.

This allows very fine granularity in how the metadata is interpreted without impeding the use of the query and fragment on non metadata resources. We can then specify contextual semantics for the DID query without having to reserve query names in general.

SmithSamuelM commented 5 years ago

This approach nicely future proofs the specification and allows stable syntax.As we now have a contextual way of creating new semantics without ever changing syntax.

SmithSamuelM commented 5 years ago

Alternatively since the default context is DID Method specific. Each method could define custom reserved query parameter names like "service". Did resolvers just need a lookup table per method.

Or alternatively the idstring ":" separator can be used on a method specific basis to identify different service endpoints where the last component of a multi-part idstring specifies its a service endpoint did:mymethod:idstringfirst:service

msporny commented 5 years ago

Alternatively since the default context is DID Method specific.

I don't think this will continue to be true due to a large number of discussions in the JSON-LD WG and the VCWG WG. We're trying to make it such that you don't need JSON-LD processors to use DID Documents and one of the results of that is that all DID Methods MUST use the DID Context (as defined by the DID WG) as the first element in the @context array. They may then layer their own vocabulary terms on top of that. So, it would most likely look like this:

  "@context": ["https://w3.org/2019/did/v1", "https://example.org/method-specific/v1"]

It's pedantic, don't know if it affects your proposal, just noting it in case it does.

dmitrizagidulin commented 5 years ago

Update: the current consensus on the format of the Service endpoint reference is reflected in PR #168. In terms of this discussion, it's option 1b. It was clarified that service references only include the service id, not the type.

w3c-ccg / did-spec