w3c-ccg / did-spec

Please see README.md for latest version being developed by W3C DID WG.
https://w3c.github.io/did-core/
Other
124 stars 45 forks source link

Add "content-type" and "content-id" DID URL matrix parameters. #195

Closed peacekeeper closed 4 years ago

peacekeeper commented 5 years ago

This adds two concrete DID URL matrix parameters. See https://github.com/w3c-ccg/did-spec/pull/189.

Description: At Rebooting-the-Web-of-Trust 8 in Barcelona, a use case was described by @talltree and @kenebert to use DID URL syntax for referencing objects in a DID target system that are not DID Documents. See https://github.com/WebOfTrustInfo/rwot8-barcelona/blob/master/topics-and-advance-readings/DID-Content-References.md.

Example: did:example:1234;content-type=schema;content-id=z9y8x7w6


Preview | Diff

peacekeeper commented 5 years ago

Note comments in the initial matrix parameter PR (now closed) by @rhiaro and @yancyribbens on scope and method-specific use cases: https://github.com/w3c-ccg/did-spec/pull/187#discussion_r277140014

msporny commented 5 years ago

I don't understand why Hashlinks don't work for this use case? Why can't we do something like:

did:example:123456789?hl=z389vh4k2jhviuh3hlkjfhui3h:zbjk4j890723huirh807f

You can encode both content hash and content type using a hashlink and doing so wouldn't require adding either of these parameters to the spec.

See content-type in hashlink spec: https://tools.ietf.org/html/draft-sporny-hashlink-03#section-2.2.2

peacekeeper commented 5 years ago

did:example:123456789?hl=z389vh4k2jhviuh3hlkjfhui3h:zbjk4j890723huirh807f

While reading the Hashlink spec, I'm confused why on one hand the above string looks like a "Hashlink as a Parameterized URL" (section 3.2.), but on the other hand it has a colon and therefore seems to include both a "resource hash" AND the "optional metadata"?

My understanding is that the Hashlink concept adds integrity protection to a link, but before you can do that you first need to figure out what the link (the URL) is that you're trying to protect. In this case, the proponents of "content references" want to come up with URLs that point to resources in the DID target system that are not the DID Document or a part of it. E.g. schemas, credential definitions in the Sovrin case, but I think it could really be any kind of resource.

Perhaps matrix parameters are not the right tool for this. Perhaps the use case of content references can simply be expressed with paths, e.g. instead of

did:example:1234;content-type=schema;content-id=z9y8x7w6

One could just use

did:example:1234/schema/z9y8x7w6

mitfik commented 5 years ago

Hi guys, would like to add few use cases to the discussion and explain our expectation from this specific parameter (not sure if that doable).

Within Semantic Working Group in HL we are working on ODCA (Overlay Data Capture Architecture) which in short gives you unify data language which you can use for Agent 2 Agent communication (sov), data transportation, data storage, digital wallet requests etc.

To achieve that, we need a way to define unit of language (content which never changes and is not controlled by anyone). For that purpose you could imagine we could use any type of MultiHash and IPFS CID but the problem in our case that we need to have trust framework around the schema to proof who issued it.

an example:

Imagine you have a schema base object representing driving license, if you are building application where you need driving license (and assuming that you are not expert in data structure of this kind) you would like to relay on a schema which is issued by some trusted entity. If you do not required trust you could basically "lookup" schema from IPFS CID but then you have no idea whoever created it and if you can trust it to be correct. If you would be able to combine CID and DID layer you could do different checks and verify the issuer.

We went through this concept under the name of DRI (Decentralize Resource Identifier)[1] where conceptually tried to solve the problem.

We were thinking about using DID as a unique identifier to identifier schema and all the elements but the problem is that whoever controls DID controls the schema which is not acceptable.

Another simple solution would be to include in the schema, issuer attribute which you would refer to specific DID which can be verified and never changes.

And of course third option is the proposed attributes.

From our perspective important features would be:

Would love to hear your opinion about above and figure out what use cases could be covered by content-type and content-id and if it make sens to put it on that level.

[1] https://github.com/THCLab/DRI/blob/master/deck.pdf

talltree commented 5 years ago

@mitfik Can you clarify one thing in your post: you say "we need to have trust framework around the schema to proof who issued it", but then you go on to say, "We were thinking about using DID as a unique identifier to identifier schema and all the elements but the problem is that whoever controls DID controls the schema which is not acceptable."

I don't understand how you can ask for a "trust framework around the schema to prove who issued it" and then not want a DID for the issuer. Of course the issuer would control the schema just like the issuer would control a DID document.

What am I missing?

talltree commented 5 years ago

I don't understand why Hashlinks don't work for this use case? Why can't we do something like:

did:example:123456789?hl=z389vh4k2jhviuh3hlkjfhui3h:zbjk4j890723huirh807f

You can encode both content hash and content type using a hashlink and doing so wouldn't require adding either of these parameters to the spec.

Manu, I think that a hashlink could work in place of content-id when you want to provide a content-based address for the target object. However with an immutable ledger, it's not always needed to have a content-based address—the content is already immutable on the ledger. So there's still a case for content-id.

The same applies to content-type if you don't want to have to deserialize a hashlink to know the content type being identified.

rhiaro commented 5 years ago

In the DID spec call today @peacekeeper compared DID URLs (including matrix params) with HTTP URLs, and arguments passed to the DID resolver with HTTP Headers. With that in mind, content-type strikes me as an HTTP header kind of thing (though the equivalent would be Accept, since we're talking request not response), and not something that should be a matrix param, ie. part of the DID URL.

mitfik commented 5 years ago

@mitfik Can you clarify one thing in your post: you say "we need to have trust framework around the schema to proof who issued it", but then you go on to say, "We were thinking about using DID as a unique identifier to identifier schema and all the elements but the problem is that whoever controls DID controls the schema which is not acceptable."

I don't understand how you can ask for a "trust framework around the schema to prove who issued it" and then not want a DID for the issuer. Of course the issuer would control the schema just like the issuer would control a DID document.

What am I missing?

The idea is to have piece of content (e.g. schema) which is not controlled by anyone (nobody can remove nor modify it) but verifiable which means that anyone can verify who is the author/issuer of that content.

The first property can be easily fulfilled by systems like IPFS (content base network) but there is no trust layer build into it, means I have no idea who created that content and if it says driving license I can not check if that came out from trusted entity e.g. Ministry of Transportation or random Joe.

On another hand DID provides trust layer where I can verify who is who. But who controls DID, controls the content (if DID would be the identifier of the content). This is why having CID (content ID) as a parameter stored somewhere on the ledger etc could solve that problem.

jandrieu commented 5 years ago

@mitfik wrote "DID provides trust layer where I can verify who is who."

This is incorrect. DIDs provide a way to find the cryptographic material for interacting securely with the identified subject. The subject so identified is entirely up to the controller of the DID, who may or may not be the subject. So I think you correctly described the trust boundary of DIDs; you are just applying the framework to a problem it doesn't solve. It helps, for sure, especially relative to usernames and passwords, but they do not on their own provide identity assurance.

mitfik commented 5 years ago

@mitfik wrote "DID provides trust layer where I can verify who is who."

This is incorrect. DIDs provide a way to find the cryptographic material for interacting securely with the identified subject. The subject so identified is entirely up to the controller of the DID, who may or may not be the subject. So I think you correctly described the trust boundary of DIDs; you are just applying the framework to a problem it doesn't solve. It helps, for sure, especially relative to usernames and passwords, but they do not on their own provide identity assurance.

Please forgive me mental shortcut but I believe that following example should show what I meant :

did:sov:123 publish content with CID. Nobody controls given CID, nobody can take it down nor modify. User A download given CID and wants to verify origin of that content, who issued it (uploaded/created). Let's take as an example driving license schema which User A wants to use in his application. User A resolve given link of did + content_id so he can verify that that this particular did registered that content in given time. After that User A can check verifiable credential of did:sov:123 to verify that whoever is behind the did is trustworthy. E.g. check that did is controlled by state agency.

Hope that this clarify it a bit.

peacekeeper commented 5 years ago

I think this PR and https://github.com/w3c-ccg/did-spec/pull/196 are related and should perhaps be the main topic on next Thursday's DID Spec and DID Resolution Spec Call?

Personally, the more I think about it, the less I am convinced that we actually need content-type and content-id. I'd like to explore what would be pros and cons of simply using paths or queries instead of matrix parameters, e.g.:

jandrieu commented 5 years ago

@mitfik there are no standard mechanisms for proving a "particular did registered that content in given time." Much less standard mechanisms to make "publish content with CID. Nobody controls given CID, nobody can take it down nor modify". All you can prove is that the controller of the proofs in DID Document, signed a piece of content at some point in time. It is also possible to prove that a given piece of content (optimally including signature) existed at least as early as a given block height. Pushing such a proof of existence to a chain is called timestamping, which is valid but not standard.. However, there is, as yet, no mechanism to verify that a given proof mechanism, e.g., a private key, is controlled by any particular institution, much less whether it was under control at the time of signing and/or timestamping. This is one of the missing links: institutional level trust in who actually controls which keys. You still have to convince yourself that a given DID is, in fact, under control of the party you think. Also, people like IPFS for content storage, but there is no guarantee the actual content will be there when you ask for it. It is also nonstandard. So, yes you could post a CID to a ledger and it would be hard to remove, but the actual content may no longer be available.

My guess is you are conflating what DIDs do with things people do with DIDs. All DIDs do is provide a way to find the cryptographic material provably associated with a given identifier, without a central authority involved. What you then do with that cryptographic material is wide open, like signing content or sending cryptocurrency or making statements about the DID subject. What you do after you have the correct DID Document is outside of the scope of DIDs and DID resolution. (Although we do capture these type of use in the use cases documents).

To my knowledge, current schema expectations are that who created the schema isn't relevant. What matters is that the schema meets your requirements and that schema expectations are matched between those writing in a particular schema and those reading it. Authorship of schema is essentially irrelevant.

mitfik commented 5 years ago

I think this PR and #196 are related and should perhaps be the main topic on next Thursday's DID Spec and DID Resolution Spec Call?

Personally, the more I think about it, the less I am convinced that we actually need content-type and content-id. I'd like to explore what would be pros and cons of simply using paths or queries instead of matrix parameters, e.g.:

  • did:example:1234/schema/z9y8x7w6
  • did:example:1234/schema?cid=z9y8x7w6
  • did:example:1234/content?type=schema&cid=z9y8x7w6

What about scenario where I don't want to operate with schema but for example with art? Is that a use case which content-id could cover?

My understanding that content-id would play a simple role that whoever is in control of the DID he can register/claim on specific content which he uploaded. A little bit similar to the idea behind https://verisart.com/

So I with did:sov:123 publishing CID and anybody can check that it came from me (was timestamped by me).

mitfik commented 5 years ago

So, yes you could post a CID to a ledger and it would be hard to remove, but the actual content may no longer be available.

That is correct and that is completely fine in my use cases. My question would be then if there is expectation that content-id is actually always id of the content which is stored by given ledger. E.g. did:sov:123 -> content-id = x12d12 does that means that this content is always stored on sovrin ledger? Or we allow to use any CID which we don't care if it would be resolvable on other networks and on Sovrin ledger we just do timestamping? I think understanding that would be essential.

mitfik commented 5 years ago

To my knowledge, current schema expectations are that who created the schema isn't relevant. What matters is that the schema meets your requirements and that schema expectations are matched between those writing in a particular schema and those reading it. Authorship of schema is essentially irrelevant.

This is actually very relevant. Just small example if I am a developer building ePassport application. I would need valid schema which is approved by one of the EU body. As a developer I have no expertise to verify if the schema which I would found is the right one or not. So I would assume that if I would find a schema which was published (timestamping is fine, as this is like approving) by EU body I am super fine with it.

If you think about the future of digital wallet where they would talk each other in more and less official ways you want to make sure that they would use "approved" schema for official content like passport, driving license etc.

jandrieu commented 5 years ago

@mitfik it doesn't matter who wrote the schema, just that the schema are the same. Just like we can both talk about the moon and the sun without needing to know who created them. It's not your moon or your sun. We just need to understand if we're taking about the same sun. Or the same moon. Authorship is nothing next to agreement about content. 127 different nations could all claim authorship of "official" schema, the math doesn't care who actually wrote any of them. It only cares that the ones I use are the same as the ones my recipient uses.

mitfik commented 5 years ago

@mitfik it doesn't matter who wrote the schema, just that the schema are the same. Just like we can both talk about the moon and the sun without needing to know who created them. It's not your moon or your sun. We just need to understand if we're taking about the same sun. Or the same moon. Authorship is nothing next to agreement about content. 127 different nations could all claim authorship of "official" schema, the math doesn't care who actually wrote any of them. It only cares that the ones I use are the same as the ones my recipient uses.

I agree that if you exchange data it does not matter who is the author/issuer as soon as we both would use the same schema, all will work. My point was that if I as a developer start writing an app where I want to decide about specific schema to use I would like to use just the one which is approved by trustworthy entity. In this case the content does not matter as I have nothing to compare to.

I could imagine scenario like this: I go to public-schema-registry.org search for schema in specific category, maybe with GICS or just by typing driving license and get suggestion of schema published by multiple entities one of them would be Jeo Kowalsky and another BC Gov. I pick BC Gov from obvious reasons.

This way you can let community decided about the most popular schema, most trustworthy and creating defacto standard for specific data structure through dominant design.

jandrieu commented 5 years ago

@mitfik So your issue is about how do you discover the schema that is acceptable to the recipient. Well... presumably they will tell you. At some point you must get a piece of information that you believe authentically represents the recipient's requirements. There are five ways I can think of to do that:

  1. Get the schema itself
  2. Get the content hash of the schema and compare that to the content hash of the schema you get through some other means
  3. Get the public key of the recipient and use a signed schema (which you get through some other means) which you verify computationally
  4. Get the public key of the recipient and use a signed hash which you verify (both the signature and the hash) for a schema you get through some other means.
  5. Use some form of reputation or collective evidence to establish through superiority of information that any of the above forms of information did, in fact, come from the recipient. This is the essence of the web of trust approach.

In all cases, you must receive some information (the schema, the hash, the public key) that you must convince yourself actually comes from the recipient. These different approaches simply push the provenance of the information to another system. If you can't get the schema directly, then you could get the hash directly. If you can't get the schema or hash directly, you can get a public key directly and use a signed schema or signed hash. If you can't get the schema, the hash, or a public key directly then you have to use superiority of information to convince yourself that whatever information (schema, hash, key) that you have is in fact from the recipient.

That superiority of information can emerge from evidence from any number of sources. From websites. From friends. From trusted fiduciaries. From blockchain. However, any particular conclusion might be instantly overturned if you receive credible information that undermines the previous conclusions. For instance, you might learn that the key you had has been compromised. Now the question because whether or not you believe the report of its compromise or if, in fact, the claim of compromise is itself a hack to get you to switch to a new fraudulent key.

My point is that DIDs do not resolve this problem. All they do is provide a way to look up cryptographic information associated with a key. That shifts a certain amount of burden from authenticity, in general, of the source and channels with which you receive the information you rely on to relying on the key management processes, including discovery and revocation.

In your example, you have no computation way to verify that the schema published by "BC Gov" was in fact published by the government of British Columbia.

Even the forms of "directly getting the information" are subject to corruption. For example, going to the provincial seat of government and scanning the official QR code in the lobby. You still have to trust that the poster itself wasn't compromised by a hacker or disgruntled employee. And perhaps there are holograms or engravings or public ceremonies and 24/7 video surveillance; all of these are simply additional evidence to support that the source of the public key is legitimate.

What I'm trying to say is that the while guarantees for DID-based crypto are more robust than just trusting an "official" website and more flexible than direct distribution of public keys, there are still limits.

jandrieu commented 5 years ago

Separating the thread. @mitfik

However, in none of this do I see what this has to do with content-ids. The question isn't whether or not it is useful to specify the hash of the expected resource at the end of a URL. That's understood to be useful. The question is whether it should be a matrix parameter or a query or path as @peacekeeper recently suggested.

The problem with using the path & query term are that if the DID-URL uses a service endpoint, we still have the aggregation issue already discussed ad nauseum. Drummond's current position is that the default behavior would be to pass the path and query term by appending them to the service endpoint URL. So as long as there is some notion of aggregating path & query, we can't use those elements for these types of parameters.

Of course, if we shift the default so that aggregation is the exception--and triggered only by a specific parameter, matrix or otherwise--then we could use path & query parameters for CID. This, IMO, is the sane choice anyway because service endpoints are expected to change so any path or parameters passed to them should, as a rule, be ignored, with the exception being when you want portable file structures and the like.

mitfik commented 5 years ago

@mitfik So your issue is about how do you discover the schema that is acceptable to the recipient. Well... presumably they will tell you. At some point you must get a piece of information that you believe authentically represents the recipient's requirements. There are five ways I can think of to do that:

I think you got it wrong. My issue would be defined like this: How as a developer pick up proper schema for specific feature without need to have expertise in a field. E.g. Building health app collecting bio-metrics without having expertise on standards withing health care. Which translates to: Let companies from the field suggest me a schema which is standard within industry for handling bio-metrics data.

jandrieu commented 5 years ago

How as a developer pick up proper schema for specific feature without need to have expertise in a field.

I think this is the disconnect. I would suggest that such developer first get expertise in the field. Perhaps by talking to customers.

mitfik commented 5 years ago

How as a developer pick up proper schema for specific feature without need to have expertise in a field.

I think this is the disconnect. I would suggest that such developer first get expertise in the field. Perhaps by talking to customers.

I am afraid that this is not realistic, even customer does not know how to build proper one. You need to take into consideration different regulation as well as privacy, PII and so on. This is effort of many people. Assumption is that trusted companies will build that and publish so everyone can reuse it. You could even think of institution like Linux Foundation publishing schema for public social profile where defining properly privacy to protect consumers. Same schema published by FB probably would look like a bit different.

This is why is important to check who published the schema and did together with VC gives that options.

brentzundel commented 4 years ago

This repo is scheduled to be archived. The work has moved to the DID WG. The artifacts in the DID WG repository share commit history with this repo, so it should be possible to raise this PR against that repo.

peacekeeper commented 4 years ago

Closing. This PR has been re-created in the DIDWG repo: https://github.com/w3c/did-spec/pull/61