w3c / did-core

W3C Decentralized Identifier Specification v1.0
https://www.w3.org/TR/did-core/
Other
407 stars 95 forks source link

Remove matrix parameters from the DID specification #159

Closed selfissued closed 4 years ago

selfissued commented 4 years ago

We should enable the use of standard URI parsers by removing matrix parameters from the DID specification.

As I wrote at https://github.com/w3c/did-core/issues/137#issuecomment-566604194, in my view, the working group should make every attempt to not introduce the matrix parameters syntax at all. There are already two mechanisms for passing parameters in URLs - query parameters and fragments. One or the other should suffice in all cases.

Even if it's necessary to do something like dedicate particular query and/or fragment names for DID purposes, that would arguably be preferable to introducing yet a third parameter passing mechanism that requires non-standard URL parsing to use.

I'd taken an action item during a recent WG call to file an issue to drive discussion on removing or defining the use matrix parameters. This is that issue.

I'll also note that the specification currently appears to be inconsistent on whether matrix parameters are actually supported or not. For instance, they're missing from the descriptions of DID portions in https://w3c.github.io/did-core/#terminology, but they're present at in the Generic DID Syntax at https://w3c.github.io/did-core/#generic-did-syntax. The specification should eventually be made self-consistent in this regard.

peacekeeper commented 4 years ago

Thanks for creating this issue. To help with the discussion, I recently shared some materials about matrix parameters in DID URLs:

selfissued commented 4 years ago

Note that the specification does not uniformly recognize matrix parameters as a component of a DID URL. For instance, the definition of DID URL at https://w3c.github.io/did-core/#dfn-did-urls is:

A DID plus an optional DID path, optional ? character followed by a DID query, and optional # character followed by a DID fragment.

This implies that, if present, any matrix parameters are part of the DID, because they are not enumerated in the list of DID URL components above. Yet the DID definition at https://w3c.github.io/did-core/#dfn-decentralized-identifiers does not state that matrix parameters are part of the DID.

selfissued commented 4 years ago

The intended semantics of this text in the Generic DID Parameter Names section at https://w3c.github.io/did-core/#generic-did-parameter-names is unclear:

Some generic DID parameter names (for example, for service selection) are completely independent of any specific DID method and MUST always function the same way for all DIDs. Other DID parameter names (for example, for versioning) MAY be supported by certain DID methods, but MUST operate uniformly across those DID methods that do support them.

selfissued commented 4 years ago

This text means that the specification is not self-contained:

The exact processing rules for these parameters are specified in [DID-RESOLUTION].

Having the definitions of parameters defined in this specification in another spec that is not normatively referenced by this specification and that is not being standardized by this working group is unacceptable.

peacekeeper commented 4 years ago

Just a quick note that this topic was discussed at the recent DID WG F2F meeting in Amsterdam.

@jandrieu and I have a task to describe

  1. at least one use case enabled by matrix parameters, and
  2. if/how that use case could be implemented without matrix parameters.

We will share a document with the WG once we have an initial draft of this. Also see the slides that were used at the F2F meeting on this topic.

peacekeeper commented 4 years ago

The following issues are dependent on this one: https://github.com/w3c/did-core/issues/35, https://github.com/w3c/did-core/issues/36

peacekeeper commented 4 years ago

As discussed during the F2F, I have created (and @jandrieu has reviewed) the following document: https://docs.google.com/document/d/1ttRWB2lwYSw7bZMRY6wTY9lzGaHcSvOKFYfNZaBBS_4/

In my opinion there are now numerous resources (the above document, plus the F2F slides, plus a RWoT#10 paper) that explain how matrix parameters enable the "Web Address Portability" use case as well as several other functionalities. I'd also like to note that this use case had strong support at the F2F meeting.

OR13 commented 4 years ago

@csuwildcat initial values use case for sidetree, https://github.com/w3c/did-core/issues/70

OR13 commented 4 years ago

Another example of potential use of matrix parameters: https://github.com/decentralized-identity/didcomm-messaging/issues/33

In transitioning from ephemeral to permanent dids.

In supporting immediate resolution of permanent / ledger anchored dids.

@msporny @dlongley interested to hear your thoughts on this issue.

OR13 commented 4 years ago

Related issue here: https://github.com/digitalbazaar/did-method-key/issues/5

awoie commented 4 years ago

Another example of potential use of matrix parameters: decentralized-identity/didcomm-messaging#33

In transitioning from ephemeral to permanent dids.

In supporting immediate resolution of permanent / ledger anchored dids.

@msporny @dlongley interested to hear your thoughts on this issue.

initial-state could help to make some DID methods more privacy-friendly by having less information about the DID subject on the Blockchain. Especially, onboarding of DIDs should have as less friction as possible and requiring a Blockchain transaction is cumbersome for most users. For advanced features which I don't need to detail here this would be still beneficial though. initial-state could help to provide everything that is needed to establish a secure connection with DID controllers from the beginning without requiring a Blockchain transaction. So, I would also see value for matrix parameters.

philarcher commented 4 years ago

I've finally got around to looking into this topic and reviewing the arguments. They are very pertinent to me as they mirror a lot of the discussion around a standard with which I am intimately involved, GS1 Digital Link. I don't expect anyone here to read 145 pages of PDF but there is a lot in common between GS1 Digital Link and what DIDs are about, especially in terms of resolution and addressing service endpoints. So, here goes...

TL;DR - I do not believe that DID URLs need matrix parameters.

I'd go further - I am sceptical that we need anything other than HTTPS.

The use case for us at GS1 is making barcodes something you can look up on the Web. Whether that's the stripy barcode you see on just about everything or a QR code, a Data Matrix or whatever, what those things carry is one or more identifiers. There are many others but I'll stick with the best known - the GTIN (Global Trade Item Number). Let's use 9506000134352 as an example.

Imagine a product on the shelf with that number encoded in a 1D barcode.

You can use any number of apps to scan that and they'll take you to wherever the app developer thinks is a good idea - usually a proprietary, locked-down data store that they operate.

What GS1 Digital Link does is to provide a URI structure into which you can put that GTIN. Again, there are much more complex examples I could use but let's keep it as simple as it can be:

https://example.com/gtin/9506000134352

Now, I've used example.com there deliberately as I want to emphasise that there are two separate things there:

  1. The GTIN - an identifier for the product that is guaranteed to be unique within the GS1 system.
  2. The location of a resolver (in this case example.com, but it could be example.com/foo/bar/ before you get to the bit that says "here's a GS1 identifier" and you're into a structured string.

We completely separate out the resolver's location and the ID to be resolved.

You can look up that GTIN on any number of resolvers. Unlike say, DOIs or ORCIDS, each resolver is free to return whatever information it wants. Each resolver is sovereign. In plain terms, you're asking each resolver "what can you tell me about the thing identified by this GTIN?" And you may not get the same answer from each one.

We want to be able to attach multiple resources/APIs to that GTIN - in DID language, call them service endpoints. Product master data, consumer information page, recall status API, promotions, instruction manuals and more.

If you're a member of staff in a retailer, you're likely to want something like the recall status of the item before you put it on the shelf, if you're a logistics company, you want to know where to record the fact that you picked the consignment up at time X from location y. Those are specialised operations for which you want to use a service endpoint associated with that GTIN (or pallet identifier or whatever).

In DID world, the default is to return a DID document. The proposal is to use matrix parameters to bypass the DID doc and go straight to a service endpoint.

In GS1 Digital Link world, the default is that you go to wherever the brand owner decides is the default, most likely a consumer-facing product information page or, perhaps a current promotion. But specialist apps can go straight to the service endpoint they need.

Let's try that... this is a GS1 Digital Link URI that works

https://id.gs1.org/gtin/9506000134352

So it's using the GS1 Global Office resolver to look up GTIN 9506000134352.

You'll get a simple redirect to a product information page. That's the default. There's a different default if you happen to speak Vietnamese. Set your browser to that language or, for simplicity, add ?lang=vi and you'll get the page in Vietnamese.

Now try this:

https://id.gs1.org/gtin/9506000134352?linkType=gs1:recipeInfo

You'll see what just happened. The query string parameter linkType does what the DID WG is considering using matrix parameters for - i.e. providing the resolver with an instruction to process the request in a particular way, in this case to provide a recipe idea for the product. By the way, we don't have the recipe in Vietnamese so that https://id.gs1.org/gtin/9506000134352?linkType=gs1:recipeInfo&lang=vi still goes to the same English language default.

Notice that when you're redirected, the query parameter is passed on.

You'd get the same result with

https://id.gs1.org/gtin/9506000134352?linkType=gs1:recipeInfo&foo=bar

That is, the resolver passes on whatever is in the incoming query string - because it doesn't matter. Well, it shouldn't. Try any Web page - add on any old junk in the query string and it won't matter because those pages will ignore what they don't understand. At least, that's the idea. We have found examples where this doesn't hold and so we have a feature that you can suppress the query string if you need to, but the default is that it gets passed on.

There are several ways of getting the full list of available links (service endpoints). Try this:

curl -I https://id.gs1.org/gtin/9506000134352

That is, a HEAD request - and note the (long) Link header.

If you want all that as JSON, try

curl -H "Accept: application/json" https://id.gs1.org/gtin/9506000134352?linkType=all

Or just click https://id.gs1.org/gtin/9506000134352?linkType=all for the HTML page with the JSON embedded.

That's our equivalent of the DID-doc. It's the full list of available service endpoints although of course there's none of the crypto authentication material that is so important in DIDs.

What about passing on the GTIN in a template? That is, imagine a service like

https://example.com/recallStatus?gtin={gtin}

We can do that on the resolver too - that is, provide a rewrite rule to take the GS1 identifiers from the incoming URL and put them into a different template. We haven't formally standardised that yet but we soon will.

Yes, this is a grab on URL space. However, we state clearly that all URIs are dumb identifiers. Applications must be aware that https://example.com/gtin/9506000134352?linkType=all is a perfectly valid URL that may or may not end up at a GS1-conformant resolver. Deal with it. And if you find a QR code with a GS1 Digital Link URI in it, you're perfectly free to swap the embedded resolver for another of your choice. So there's no single point of failure (although we do define canonical URIs as being on id.gs1.org).

We do 'reserve' the linkType parameter and one or two more (again, recognising that in other contexts, those same params can be used for something else - we don't control the whole URL space).

Including instructions to the resolver in the query string is not a problem. Passing on params in a query string to your service endpoint shouldn't be a problem.

I don't claim credit for any of this, nor does anyone else at GS1. It's sort of HATEOAS and Linked Data and... well, it's the Web.

csuwildcat commented 4 years ago

Matrix parameters play a valuable role in the DID document processing phase, if properly scoped strictly to that phase, imo. Matrix parameters should:

  1. Be confined to the resolution phase of DID URI parsing to determine the correct DID Document.
  2. Be limited in function (beyond document resolution) to selecting a portion of the document and/or forming a URL in the process.
  3. Not be used for things outside of the two purposes above.

If we were to use URL params alone, you would absolutely need to 1) define a namespace for DID-related parameters (e.g. _did-PARAM-NAME=), and for generic reserved parameters, you would need to further create a subspace within the general DID param namespace to distinguish any reserve params, for example: _did__version=.

There also exists the strange question of what to do with DID-related and DID-reserve parameters after resolution? Are they passed after parse to downstream application-level code via a generated URL? Are they removed after processing?

Using URL params alone does not deliver you any off-the-shelf simplicity or ease of integration, as you will certainly need to:

  1. Define the custom namespacing and reserve param syntaxes I highlighted above
  2. Define how processing DID resolution-only params
  3. Define whether or not certain DID resolution-only params are dropped from the generated URLs resolution may produce
  4. Take into account how user agents should represent a DID URI that contains a mixture of jumbled params, some of which may be active for DID resolution-only, and others that are meant for userland code for traditional URL param handling. Also something to consider: will ordering of DID resolution-only params ever require order-dependent processing? If so, what new DSL syntax will we need to invent to define this?

Could you do all this without matrix params? Sure, but it's a fantasy that we're going to do it without introducing a bunch of convoluted, specialized processing steps and library code that significantly diverges from how URL params are ordinarily handled.

msporny commented 4 years ago

@csuwildcat wrote:

Be confined to the resolution phase of DID URI parsing to determine the correct DID Document.

If you're going to confine it to the resolution phase (which I do agree, is a good idea), then it should be an argument to the resolution process, possibly in the resolution request (instead of in the DID URL itself). It feels like this is an argument /against/ matrix parameters instead of for it.

At this point, I do think that we have consensus that DID parameters are used during the resolution phase or URL rewriting phase... do we have any use cases for using a DID parameter outside of those two phases?

msporny commented 4 years ago

As a data point, Digital Bazaar has never needed or used DID parameters or matrix parameters to date. Not in the Veres One implementation. Not with any of Digital Bazaar's customers' use cases. I do think that's instructive, as we do have a variety of very complex use cases and none of them require the use of matrix parameters. In fact, none of them need DID parameters encoded in the DID URL.

We do need to express what are now called DID parameters, but doing so during the resolution phase, in a resolution request is good enough for all of our use cases, IIRC.

peacekeeper commented 4 years ago

@philarcher thanks for this demo and very clear explanation! I especially like how you use the Link: header to be compatible with the basic concept of Web Linking.

Here are some comments:

I am sceptical that we need anything other than HTTPS.

Keep in mind that DID Resolution is an abstract function that can be bound to HTTP(S) but doesn't require it. You can resolve a DID / dereference a DID URL by calling an HTTP(S) endpoint, but you can also do that by invoking a library, command line tool, etc.

We completely separate out the resolver's location and the ID to be resolved.

Nice idea, this reminds me of some other persistent identifier (PID) concepts such as ARK ID which I learned about at a conference on this topic (see this blogpost if you're interested).

https://id.gs1.org/gtin/9506000134352?linkType=gs1:recipeInfo You'll see what just happened. The query string parameter linkType does what the DID WG is considering using matrix parameters for

That is, the resolver passes on whatever is in the incoming query string - because it doesn't matter.

Yes, this is a grab on URL space.

We do 'reserve' the linkType parameter and one or two more

I think this design is a mistake for some reasons outlined in the Google doc that describes the "Web Address Portability" use case. Here is a summary:

I think it's inherently dangerous to intermix two separate sets of parameters (parameters for resolution, parameters for service endpoints) into a single syntactical construct. Note that URNs (RFC8141) also have two separate syntactic constructs for this, for good reasons.

peacekeeper commented 4 years ago

@philarcher one more question, do GTINs support what is called "partial redirection" in PURLs?

E.g. could you do something like

https://id.gs1.org/gtin/9506000134352/photo.jpg?linkType=gs1:recipeInfo

and expect to get redirected to this? (note the path /photo.jpg that is added to the URL).

https://dalgiardino.com/mushroom-squash-risotto/photo.jpg?linkType=gs1:recipeInfo

philarcher commented 4 years ago

Thanks @peacekeeper.

If linkType is used for something else by another server, there's no problem since we only 'reserve' it for the GS1 ecosystem. Outside that, of course, all URLs are dumb strings. We make it plain that applications should be aware of this. How do you know that you're addressing a GS1 resolver? There MUST be a Resolver Description File at /.well-known/gs1resolver. No file there? Don't assume anything about the linkType parameter. That's not bullet proof, but it's a start.

And yes, other query string formats are usable, sure. We don't stop anyone using those. You can define a rule in the resolver that turns a conformant URI into whatever template your service endpoint needs. It makes no demand on the target. And if needs be, you can suppress the default behaviour of forwarding the full query string.

To your separate question, no, https://id.gs1.org/gtin/9506000134352/photo.jpg?linkType=gs1:recipeInfo is not a conformant GS1 Digital Link URI and would return a 400 bad Request error (you can't add random stuff to the path segments, only the query string). This does not affect the behaviour of other servers which, of course, remain sovereign.

I don't expect to persuade you, Markus, but I wanted to record how we're doing it and thus show that alternatives are possible. There are factors at play for DIDs that are not relevant to us, but we do have a distributed system for resolving identifiers and discovering related resources.

OR13 commented 4 years ago

Why can't we reserve the query string param "matrix-parameters", and use EncodeURIComponent on it?

That way people who don't want to use them can use that, and people who do can translate from that to matrix parameters if they encounter it.

Of course we will still have the problem of query parameter sorting...

https://support.cloudflare.com/hc/en-us/articles/360031777052-Caution-when-enabling-Query-String-Sort-with-WordPress-admin-pages

Can't I issue a 302 redirect based on processing query string params?

http://example.com/resolve/did:ex:123;service=gs1Resolver/gtin/9506000134352?linkType=all

becomes:

https://id.gs1.org/gtin/9506000134352?linkType=all

http://example.com/resolve/did:ex:123?matrix-params=encodeURIComponent(service=gs1Resolver/gtin/9506000134352)&linkType=all

becomes:

https://id.gs1.org/gtin/9506000134352?linkType=all

Does this work?

philarcher commented 4 years ago

I see no objection to those redirections @OR13. After all, they're performed by the example.com resolver which can do whatever it wants - all domains are sovereign. And your example shows the independence of the resolver from the GS1 identifier to be resolved. I'm all for that!

Reserving the matrix-parameters name for DID URLs might be OK, sure, but, as we did, you'll need to warn that it's only in the specific context of a DID resolver that the query parameter has any meaning and applications cannot assume that it means that everywhere - it can mean something quite different in any other context.

As for query string sorting? Really? No. No way, never. No. Stop that nonsense. Query params are un-ordered. If a param is repeated, the last value wins. If a server can't handle that then it's a mess and needs to be put out of its misery with the judicious use of the delete button.

You might want to define a canonical form of a URL so you can hash and sign it - OK (we might be about to do just that) - but as a URL it is isomorphic with any URL that has the same params in a random order. Or am I making an impetuous fool of myself here ;-) ?

OR13 commented 4 years ago

+1 for canonical URLs... I imagine that will involve sorting ;)

However, if you canonize a wordpress admin URL in order to sign it, it would probably not work any more... I'm all for deleting wordpress, but I suspect thats not going to happen :)

I think its worth having a fallback for matrix params that works with traditional URL parsers, because there won't be any software that supports them out of the box.

peacekeeper commented 4 years ago

You can define a rule in the resolver that turns a conformant URI into whatever template your service endpoint needs

I don't think we want the DID URL dereferencing process to be dependent on custom resolver rules or templates.

All we want is take the service endpoint URL from the DID document as a "base URL", and apply the DID URL's path+query+fragment as a standard relative URI reference, as shown in slides 159 and 160 of the F2F meeting.

I haven't seen any proposals yet on how to achieve this without matrix parameters.

dlongley commented 4 years ago

All we want is take the service endpoint URL from the DID document as a "base URL", and apply the DID URL's path+query+fragment as a standard relative URI reference, as shown in slides 159 and 160 of the F2F meeting.

I understand that there has been interest in solving the problem that way, but I do think that's actually a potential solution to the problem vs. the actual problem. The problem, as I understand it, is that we want to be able to move the authority for some path+query+fragment for a relative-ref "at will" by changing the authority part of the URL via a service description in a DID Document -- whilst keeping a stable URL for consumers. This is, in fact, precisely what slides 159-160 show happening.

I haven't seen any proposals yet on how to achieve this without matrix parameters.

I think what others are saying is that we can address this use case by solving it in a different way. To get specific, looking at slide 159, instead of this:

did:ex:123;service=files/myresume/doc?version=latest#intro

It seems we could do this:

did:ex:123?service=files&relative-ref=%2Fmyresume%2Fdoc%3Fversion%3Dlatest%23intro

And reserve service and relative-ref for DID URLs. The same HTTPS URLs from the slide would result from the resolution process. The resolution process would involve using standard URL parsing on the DID URL, for example:

const u = new URL('did:ex:123?service=files&relative-ref=%2Fmyresume%2Fdoc%3Fversion%3Dlatest%23intro');

Which yields:

URL {
  href: 'did:ex:123?service=files&relative-ref=%2Fmyresume%2Fdoc%3Fversion%3Dlatest%23intro',
  origin: 'null',
  protocol: 'did:',
  username: '',
  password: '',
  host: '',
  hostname: '',
  port: '',
  pathname: 'ex:123',
  search: '?service=files&relative-ref=%2Fmyresume%2Fdoc%3Fversion%3Dlatest%23intro',
  searchParams: URLSearchParams { 'service' => 'files', 'relative-ref' => '/myresume/doc?version=latest#intro' },
  hash: ''
}

From here, the searchParams would be used to obtain the fragment name for the service (files). This would be appended to the full did path along with the hash character to produce the service ID: did:ex:123#files.

This would be used to obtain the service description from slide 159:

{
  "id": "did:ex:123#files",
  "serviceEndpoint": "https://filestore.org/user123/"
}

And the serviceEndpoint URL would be retrieved: https://filestore.org/user123/. Then the relative-ref query param value would be URL-appended to that value producing:

https://filestore.org/user123/myresume/doc?version=latest#intro

As you can see, the same HTTPS URL is output from this process as the slide. There's also no conflict with the DID URL query parameters and whatever the HTTPS server may use -- as processing must be done on a DID URL independently to produce the HTTPS url. Once the serviceEndpoint is changed to https://selfhosted.me:8080/ the process correctly outputs: https://selfhosted.me:8080/myresume/doc?version=latest#intro just like the slide.

The same works for slide 160:

did:ex:123;service=socialnetwork => did:ex:123?service=socialnetwork

Again, with the same HTTPS URLs resulting from the resolution process. Note that any query parameters/fragments that are part of the DID URL itself are only handled by DID resolvers, not mixed or combined in any way with query parameters/fragments intended for the server. It doesn't look as pretty, but we shouldn't have any "mixing" issues, because you just have to encapsulate service and relative-ref values using URI encoding.

The difference with this approach may only be in where the transformation from a non-DID URL to a DID URL would occur. You can't "edit" the DID URL in place in the same way you would edit the HTTPS URL; e.g., you can't just add/remove path components using regular URL tools. You have to understand that it's a DID URL and work within the "relative-ref" value. That would seem to be the main trade-off and perhaps that's where the point of contention is. If so, I think it would help to surface that better.

Is that right? You'd prefer to have consumers be able to edit an existing DID URL without knowing it's a DID URL -- to make changes to the path, query, fragment, etc.? This as opposed to actually just resolving it?

IMO, I think it's not too much of a burden to have to either parse or resolve the DID URL first before editing it (and then, subsequently translate it back to a DID URL as needed). I think that's a better trade-off vs. creating new URL parsers.

jandrieu commented 4 years ago

Using the relative URI relative reference architecture will strip the last part of the service endpoint (everything after the last "/"), which, if the endpoint wanted to be a DID itself, e.g., did:example:joe could remove the entire DID URL (if it has no '/').

One thing I'm seeing--even in my own thinking & writing--is the desire to dereference a DID and, ultimately, return either a resource or a URL. So, despite my concern over privacy issues, this expectation may need to be supported.

However, here's a proposal achieve what you want (redirection) without matrix parameters:

  1. Allow one and only one service endpoint, which we might as well call a "redirect"
  2. When resolving a DID, you return the DID Document
  3. When dereferencing a DID, you return the single redirect
  4. In the service endpoint (redirect) property, we add an aggregationMethod property that specifies how any "extra parts" of the DID URL are merged with the extra parts of the service endpoint. This supports both the portable hierarchy use case as well as situations where different rules might be appropriate for a particular service endpoint type.

At least four algorithms are immediately apparent as useful:

  1. replace (DID URL path/query parts replace service endpoint path/query parts)
  2. ignore (DID URL path/query parts are dropped, the service endpoint is dereferenced unmodified)
  3. aggregate (a modification of the relative reference URL that preserves the file part)
  4. relative (use the relative reference algorithm directly)

Another proposal would be to select the service using a reserved query term, like did:example:abc?_DID_service=myService

If you want to do it without a possible collision in the query name space, just have one and only one service endpoint that is always used when dereferencing.

Most privacy advocacy on this issue have suggested that the best way to deal with my concerns vis-a-vis consent, privacy, and gdpr, is to put information behind a single service. Thus, a single service requirement would support BOTH service dereferencing and minimizing privacy risks without needing matrix parameters.

So, there are two proposals for you.

peacekeeper commented 4 years ago

@dlongley thanks for this great analysis and write-up. I agree this would work as an alternative to matrix parameters and that it would fulfill the use case.

Is that right? You'd prefer to have consumers be able to edit an existing DID URL without knowing it's a DID URL -- to make changes to the path, query, fragment, etc.?

Yes pretty much. I think the idea that the path+query+fragment "fully belong to" the consumer (the DID controller) is elegant and powerful, just like in the case of HTTP URLs the path+query+fragment "fully belong to" the domain owner. The path+query+fragment of the DID URL could be freely edited, and the relative URI dereferencing algorithm would just continue to work. Personally I prefer this to having to introspect the "relative-ref" value. But I can understand if others see it differently.

If we decide to do it that way, I would probably propose next to remove the "path" component from DID URLs, since I can't think of any use for it anymore, and the following spec text would not be accurate anymore:

A DID path SHOULD be used to address resources available via a DID service endpoint.

peacekeeper commented 4 years ago

@jandrieu

Allow one and only one service endpoint, which we might as well call a "redirect"

I agree this would work, but it would be a bit like having an HTTP URL that is dereferenced to a web page (or an RDF graph) which is only allowed to have one link to another HTTP URL. This is not how the web and relationships between resources (see Web Linking) should work.

At least four algorithms are immediately apparent as useful: replace, ignore, aggregate, relative.

Funny, we had a feature similar to this in XRI Resolution (an "append" attribute in your XRD document - see section 13.7.1. of XRI Resolution 2.0)

Most privacy advocacy on this issue have suggested that the best way to deal with my concerns vis-a-vis consent, privacy, and gdpr, is to put information behind a single service.

I believe this idea has been brought up before by @dlongley (see https://github.com/w3c-ccg/did-spec/issues/90#issuecomment-439936749) and is being tracked as an issue in DID Resolution (see https://github.com/w3c-ccg/did-resolution/issues/35). But it would introduce a dependency on an intermediary service you'd have to trust, no?

csuwildcat commented 4 years ago

I just want to point out that @dlongley's example would need to be far uglier than simply URL encoding what amounts to a service path and sticking on a service URL parameter. You will need to:

  1. Come up with a leading prefix that namespaces all DID-specific parameters of this sort so we don't end up with a minefield of collisions. Something like this did:ex:123?__did__service=files&__did__relative-ref=%2Fmyresume%2Fdoc.... And if you want and DID Method-specific params, you'll probably need to add to that: __did__method:foo=, or something like it.
  2. You will probably want to remove any other DID-specific parameters from the resulting string (if any remain from processing for any reason).
  3. I think (could be wrong) that regular params intended to be passed along after a path transform should probably be URL encoded within the path string they are tied to (what Dave named relative-ref). This is because two DID-specific params that include paths may both need to include, for their activities, regular URL params, which may collide with different values. (yikes, just yikes)

All I can say is that the URL param-only approach is probably going to be a mess in the end, and if we do decide to go down that path (<-- URL pun) all these things are going to have to be addressed.

dlongley commented 4 years ago
  1. Come up with a leading prefix that namespaces all DID-specific parameters of this sort so we don't end up with a minefield of collisions.

This could be avoided with a registry, reserving a single character for that, or requiring that method-specific query parameters (which this group may eventually decide we want to avoid anyway) have some prefix, but not core ones.

  1. You will probably want to remove any other DID-specific parameters from the resulting string (if any remain from processing for any reason).

Not sure what you mean here. Could you give an example so we could analyze?

  1. I think (could be wrong) that regular params intended to be passed along after a path transform should probably be URL encoded within the path string they are tied to (what Dave named relative-ref). This is because two DID-specific params that include paths may both need to include, for their activities, regular URL params, which may collide with different values. (yikes, just yikes)

The relative-ref is from the URL spec: https://tools.ietf.org/html/rfc3986#section-4.2

From the link you can see it includes relative path, query, and fragment. I can't really follow the rest of your comment. Could you give an example here as well?

csuwildcat commented 4 years ago

@dlongley here's an example of 3:

Gross, but I guess this works:

__did__service=files&__did__relative-ref=%2Fresume...&normal=param&__did__service=query-cache

But what about if I add something else into the string used in resolution that requires another path and parameters? Let's say I have a different DID-specific param, like __did__someparam=foo, which needs its own path, and moreover, its own regular URL params? How would each type of path-reliant component include its relative-ref, given they are different? Is this prohibited? If it is allowed, or we want to allow it, two components of the URI that rely on normal URL params may need to use the same regular URL params. To illustrate:

Would it be possible for there to exist two components, for ex: __did__service=files and __did__someparam=foo, that both need to include paths, like relative-ref in Dave's examples? Beyond that, what if each of those components used regular URL params as well? What happens if they both want to use the same regular URL param name for different values?

I would really like to cleanly separate DID resolution values from any regular URL values that get carried over to resolved paths/outputs, if at all possible.

dlongley commented 4 years ago

didservice=files&didrelative-ref=%2Fresume...&normal=param&didservice=query-cache

This specifies "didservice_" twice. Was that a mistake, did you mean something else? If not, we can define our own behavior to handle that situation (e.g., last wins or "error"), but it's not at all clear to me what the URL creator would intend to be referencing if they were to include two service query parameters. If you were trying to offer up a choice of options, it would seem to me that you'd want to construct some entirely different URL where multiple DID URLs were themselves options as query parameters. A clearer use case here would help.

I'm also not sure what "normal=param" means. If by "normal" you mean a query param that would appear on the resolved HTTPS URL, I want to be clear that, with this proposal, you do not specify any query parameters directly that would "get passed through" to the resolved HTTPS URL. Any query parameters that you want to be part of the resolved HTTPS URL go, URI encoded, into the relative-ref query param. They are not query params themselves in the DID URL, rather, they are encapsulated. There is a clear separation of query params for the DID URL and query params (the entire relative-ref really) for the resolved HTTPS URL.

I would really like to cleanly separate DID resolution values from any regular URL values that get carried over to resolved paths/outputs, if at all possible.

I feel like the proposal above does that -- so we may be miscommunicating. All the DID URL query params appear in the DID URL directly, any query params for the resolved HTTPS URL are URI-encoded and encapsulated within the value of relative-ref.

Would it be possible for there to exist two components, for ex: didservice=files and didsomeparam=foo, that both need to include paths?

What's the use case for doing that at the same time? What does the URL creator intend to reference with such a URL?

If you specify service as a query param in the DID URL then it's interpreted by the DID URL resolver as an instruction to find a service endpoint in the DID Document and output another URL per the resolution process outlined above. Any other query parameters in the DID URL would be dealt with separately and, most likely, we would define it such that they are processed first. So, for example, if we have "version" or similar query parameters that are part of the resolution process, the resolver would get the appropriate version of the DID Document before looking for the service endpoint and completing the resolution process to produce the final HTTPS URL.

csuwildcat commented 4 years ago

Let me try to make these questions even more terse:

  1. Can or would there be a case where there exist two path-reliant components in a DID URI, wherein both require something like relative-ref?
  2. If 1 is Yes, how would each specify a different path?
  3. If components need, or are allowed, to use the same regular/user-space URL parameters (not DID-specific, not prefixed), and they desire/require different values for the same parameter, how would that work?
jandrieu commented 4 years ago

@peacekeeper wrote:

I agree this would work, but it would be a bit like having an HTTP URL that is dereferenced to a web page (or an RDF graph) which is only allowed to have one link to another HTTP URL. This is not how the web and relationships between resources (see Web Linking) should work.

Actually, I'd say that is exactly how the web works when you are redirecting rather than returning the requested resource. There are no provisions in the web for selecting which service you get redirected to when a server redirects during dereferencing. Redirect messages don't return multiple URLs which are then automatically further dereferenced based on a parameter in the original URL.

Please correct me if I'm wrong.

There is also no mechanism for defining in a URL for which of any number of links in the returned resource should automatically be followed.

Trying to add that extra layer of selectivity AND automatically redirecting is the problem. Either redirect automatically to a single resource or don't redirect at all and return the DID Document.

Also

I believe this idea has been brought up before by @dlongley (see w3c-ccg/did-spec#90 (comment)) and is being tracked as an issue in DID Resolution (see w3c-ccg/did-resolution#35). But it would introduce a dependency on an intermediary service you'd have to trust, no?

That's a similar idea, but it wasn't raised in the context of avoiding the complexity of matrix parameters, just as a service type for one of many services. I'm saying it actually is a nice solution to both privacy and matrix parameters, AND it gives us an easy way to make a DID-based URL look and feel like a typical web URL that can point to a specific resource hierarchically. The DID can just looks like a domain name and everything that follows, with appropriate aggregation rules depending on the service.

philarcher commented 4 years ago

Redirect messages don't return multiple URLs which are then automatically further dereferenced based on a parameter in the original URL.

No, the Web doesn't do that out of the box @jandrieu, but it is pretty much exactly what GS1 Digital Link does.

Try this: curl -I https://id.gs1.org/01/9506000134352

(No query param)

Note the Location header which would be the redirect to a default destination (in this case a consumer-facing product information page), but notice all the links in the Link header, all of which have a rel value.

If you now do this: curl -I https://id.gs1.org/01/9506000134352?linkType=gs1:hasRetailers

you'll be redirected to a different page about where you can buy that thing.

In DID terms, those are two of the available service endpoints and we use the linkType param to determine which one you go to (we also take account of language and can do more as well, but that's the basic here). No need to prefix linkType or to create a registry - it only means this in the context of a GS1 resolver. Elsewhere it means whatever you want it to mean.

And we simply pass the full query string on to the target resource. No need to separate and merge or whatever. But we can add in what amount to rewrite rules that take elements on the incoming URI and reformat it in the outgoing request as needed for that particular service.

Also just to emphasise, any redirect/service endpoint found at one resolver may or may not be replicated at another. I might resolve the same GS1 identifier at a different resolver and be redirected somewhere else entirely. It's the end user (or their user agent) that chooses the resolver they most trust.

iherman commented 4 years ago

I went through all the issue again and, the way I see it, the main argument/use case that I really see is the PURL one, i.e., to reproduce the PURL functionalities.

However... although the contrary was never stated, we have to emphasize one point. The PURL behavior is not part of the fundamental (HTTP) URL layer. It is a service that was set up by some clever people on top of a well defined and, comparatively, simpler layer. On the other hand, what we are trying to do here is to define both the fundamental layer of handling a DID and a DID URL and build into (and not onto) it a PURL like behavior. And that seems to clash and getting complicated. (This may be the reason why TBL's original idea never "made it" into the HTTP URL architecture after all.)

If we look at it this way, then, if my understanding of it is correct, the separate 'redirection' service of @jandrieu plays the role of PURL: it is a separate service on top of DID + DID URL. Details put aside, that might be the right layering of abstraction. It is not the job of the DID spec to define a redirection service (just as PURL is not part of the HTTP URL spec); such a service should be defined by another spec, social contract, or whatever.

I guess what I am saying is that I'm beginning to wonder whether the PURL use case (which, as far as I can see, is the "main" use case) is a valid one.

philarcher commented 4 years ago

+1 @iherman. From my POV, DID-Core should not be talking about service endpoints. They should however, be talked about in the context of resolution which, again, I believe the WG should take on as a Rec Track doc.

dlongley commented 4 years ago

@csuwildcat,

Side discussion about not having service descriptions at all notwithstanding...

Can or would there be a case where there exist two path-reliant components in a DID URI, wherein both require something like relative-ref?

No, that's confusing and there's no use case.

If 1 is Yes, how would each specify a different path?

It was a "No", so no-op here. :)

If components need, or are allowed, to use the same regular/user-space URL parameters (not DID-specific, not prefixed), and they desire/require different values for the same parameter, how would that work?

You don't do that. A URL parameter in a DID URL is specific to DIDs -- URL params for the resolved HTTPS URL are never put directly in a DID URL with the proposal we're talking about. Also, there is no use case for desiring/requiring different values for the same parameter and it would be a confusing mess to do that (bad design). So, I think we're good here.

@iherman, @philarcher,

Now, back to whether or not we eliminate service descriptions entirely ... I'm still undecided on that. I understand the arguments for removing them and am quite sympathetic to simplification and to layering architecture. But I can't fully support that position until it's clear the self-sovereign redirection service use case (which has been around for quite a long time) can be solved in a reasonably elegant way via layering. I don't want to be dismissive of the use case by simply saying it can be done without knowing enough about the details and trade offs.

iherman commented 4 years ago

To nuance the point a bit: my comment in https://github.com/w3c/did-core/issues/159#issuecomment-598651975 was only on the inclusion or not of matrix parameters in the DID URL syntax, not on whether the DID document would/should include service parameters. I believe these two issues are separate, and I do not have a strong opinion on the latter.

peacekeeper commented 4 years ago

I don't think that we should remove service endpoints from the DID document, or that we should allow only a single service endpoint in a DID document.

I also don't think we should change the current spec text about DID Paths and DID Query:

A DID path SHOULD be used to address resources available through a service endpoint.

A DID query SHOULD be used to address resources available through a service endpoint.

We have service endpoints in the DID document. And we have DID path and DID query to address resources at those endpoints. Anyone disagree with that?

peacekeeper commented 4 years ago

@iherman

The PURL behavior is not part of the fundamental (HTTP) URL layer.

Agreed, but isn't one of the main differences between HTTP URLs and DID URLs that DIDs are persistent by design? Doesn't this mean that DID URLs are automatically PURLs? I'd argue that this is a fundamental built-in feature of DID URLs, rather than a use case.

msporny commented 4 years ago

We have service endpoints in the DID document. And we have DID path and DID query to address resources at those endpoints. Anyone disagree with that?

At this point, based on everything that's being discussed above, yes, disagree with the definition of DID Path and DID Query referenced above. The definitions should be more generic than that.

Based on the discussion to date, there are a few compelling arguments for doing the following things:

peacekeeper commented 4 years ago

I agree that @msporny 's suggestion would be a way to remove matrix parameters and still achieve most of the use cases that have so far been associated with them (illustrated in further detail in @dlongley 's analysis above - https://github.com/w3c/did-core/issues/159#issuecomment-598286814).

But I also believe we would be throwing away a major opportunity, since the path+query+fragment information space of a DID URL would then be "owned" by the DID specification authors and registry operators, and not anymore by the DID controller. This would abandon the analogy we have with HTTP URLs, where path+query+fragment are also left to the user, and the DID URL path+query+fragment would not apply directly to the service endpoints from the DID document anymore.

dlongley commented 4 years ago

But I also believe we would be throwing away a major opportunity, since the path+query+fragment information space of a DID URL would then be "owned" by the DID specification authors and registry operators, and not anymore by the DID controller. This would abandon the analogy we have with HTTP URLs, where path+query+fragment are also left to the user, and the DID URL path+query+fragment would not apply directly to the service endpoints from the DID document anymore.

But, to be clear, this "ownership" would still be available in another place (within the values of service, relative-ref, and whatever service endpoint you put in your DID Document). So I don't feel like anything has been lost, just moved to ensure we don't have conflicts between the different constituencies whilst still accommodating existing tools -- which, in turn, should remove barriers to adoption.

csuwildcat commented 4 years ago

I was thinking about ways to use just URL params and came up with a strawman - let me know what you all think:

What if we assumed URL params/fragments at the 'top-level' of DID URIs (I'll explain exactly what I mean by that) were only allowed to be DID-specific params/fragments that apply to resolution and are not included in any URL or transformed output, while all other userland-destined paths and their params/fragments were URL encoded within those parameter path values? Here's an example of what a DID URI would look like when passed to a resolver (note: I made up the param service-path to keep this unopinionated):

DID URI - only DID-params, like service, version, etc. live at the top level: did:ex:123?service=files&service-path=%2Fpics%2Fme.png%3Fversion%3D14.6%23frag1&version=2#key-1

Resolved URI: myfiles.com/user123/pics/me.png?version=14.6#frag1

You'll notice that there are two version parameters and two fragments, the ones at the top level of the DID URI, and ones that are encoded within the path of the service parameter value itself. This is intentional, to illustrate the fact that the version=2 and #key-1 at the top level of the DID URI are only for use in DID resolvers, and are not transferred to the URL output, while the service path is free to include its own parameters, which may even overlap with the names of the DID-specific parameters at the top level.

I believe this would eliminate a ton of issues:

  1. We wouldn’t need Matrix parameters (as far as I can tell)
  2. We wouldn't have to deal with DID-specific params being commingled with the userland/output-destined params that are intended to be included in resolved URLs/output.
  3. Because of 2, we would no longer need to prefix DID-specific params in DID URIs, because only DID-specific params would ever be allowed in the top level of DID URIs.
  4. Similarly, fragments would not collide if you wanted to use one value for the DID URI itself, and a different value for the resolved URL.
OR13 commented 4 years ago

With matrix params:

did:ex:123;version=2;service=files;/user123/pics/me.png?version=14.6#frag1 =>

myfiles.com/user123/pics/me.png?version=14.6#frag1

Without matrix params:

did:ex:123?service=files&service-path=%2Fpics%2Fme.png%3Fversion%3D14.6%23frag1&version=2#key-1 =>

myfiles.com/user123/pics/me.png?version=14.6#frag1

csuwildcat commented 4 years ago

I should note that under the proposed URL param scheme I mentioned, you could still support the same concept of method specific parameters, you'd just need to agree on a leading character that signified the parameter was method specific, for example:

did:ex:123?-method-param-name=value

The leading -method- string is a lot like CSS vendor prefixes, which denote it is targeting a specific browser/engine.

csuwildcat commented 4 years ago

Another example of the proposed scheme:

did:ex:123?service=datastore&-cooldidmethod-initial-state=456...&service-path=%2Fcredentials%2Fpublic%3Fissuer%3DUSG%23frag1

Second variant, that assumes the service parameter parsing during resolution is specified to be a little smarter:

did:ex:123?-cooldidmethod-initial-state=456...&service=datastore|%2Fcredentials%2Fpublic%3Fissuer%3DUSG%23frag1

OR13 commented 4 years ago

did:ion:Ei2a5d...;service=hub;ion:initial-state=ad235.../public/credentials?issuer=USG#second

=>

did:ion:Ei2a5d?service=hub&-ion-initial-state=ad235f4w5w&service-path=%2Fcredentials%2Fpublic%3Fissuer%3DUSG%23second

msporny commented 4 years ago

I was thinking about ways to use just URL params and came up with a strawman

Isn't this the exact same thing that was proposed since before the DID WG F2F (and again at the DID WG F2F)? What am I missing? Because if it is, this is what the query-only parameter people have been arguing for... and if so, hooray, we have alignment! :)

csuwildcat commented 4 years ago

I was thinking about ways to use just URL params and came up with a strawman

Isn't this the exact same thing that was proposed since before the DID WG F2F (and again at the DID WG F2F)? What am I missing? Because if it is, this is what the query-only parameter people have been arguing for... and if so, hooray, we have alignment! :)

Hell if I know, I wasn't at the F2F. I think there are a couple little additions (to preserve method-specific stuff, but maybe not?). If so, and I basically just regurgitated exactly what you all were alluding to, then sure, I think this could work. The tradeoff is the URI encoding of the values in a 'two-tier' parameter scheme, but I can live with it.

kdenhartog commented 4 years ago

@csuwildcat wrote:

Be confined to the resolution phase of DID URI parsing to determine the correct DID Document.

If you're going to confine it to the resolution phase (which I do agree, is a good idea), then it should be an argument to the resolution process, possibly in the resolution request (instead of in the DID URL itself). It feels like this is an argument /against/ matrix parameters instead of for it.

At this point, I do think that we have consensus that DID parameters are used during the resolution phase or URL rewriting phase... do we have any use cases for using a DID parameter outside of those two phases?

I could see the use of version being used in a way to reference two different authorities with two different did subjects, at which point you want a DID parameter to be a part of the authority. Whether we want to handle this case in this way is a different story that we should consider after we decide if this is valid.

Imaging we've got did:example:123 which is the did of the president. The president is an office, and not only one person, therefore many did subjects and did controllers exist throughout the lifetime of the did.

In state 2 (did:example:123;version=2) this did refers to Alice who's the controller of the keys and the subject of the did.

However, Alice is now handing the reigns of the presidency over to the new president, Bob.

This means that in state 3 (did:example:123;version=3) this did refers to Bob who's now the new controller of the keys and the subject of the did.

With this case, the version needs to be apart of the authority rather than the path, query, or fragment.

So my question is should we have the ability to handle transitions of did subjects in such a way that it's still apart of the authority section of the URI and if so is the use of a matrix parameter the correct way to do this? If so, then I would be in opposition to removing them.