w3c-ccg / did-spec

Please see README.md for latest version being developed by W3C DID WG.
https://w3c.github.io/did-core/
Other
124 stars 45 forks source link

(Partially) Encrypting DID Documents #172

Closed RieksJ closed 4 years ago

RieksJ commented 5 years ago

While many DID Documents contain public information, there are also use-cases where this is not the case. One is where the relation between two parties, which is specified by a DID-pair, is to be considered private. This is the case between customers and banks, insurers, government agencies, healthcare professionals or institutions, etc.

This can be used e.g. in the P2P protocol that sets up a secured connection between agents in a relation, where the owner of one DID can encrypt the associated DID Document with the public key of the other party in the relation. That would allow such DID documents to be stored anywhere without revealing any information about its contents.

There may be other use-cases - to be discovered: would there be GDPR-related benefits (in case the DID document somehow gets to contain PII)? Anything else?

This issue calls for a discussions of the pros and cons of encrypting a DID document, apart from its first item (the DID itself), with any key that the owner of the DID document sees fit.

dhh1128 commented 5 years ago

Encrypting data that is published (e.g., on a blockchain) is dangerous, as advances in crypto may make it easy to crack the data at some point in the future. However, I agree that it is better to encrypt such data than to leave it plaintext.

An alternative is to not place a DID Doc in a public repository at all. This is the approach advocated by the did:peer method spec (https://dhh1128.github.io/peer-did-method-spec/index.html).

peacekeeper commented 5 years ago

I agree with @dhh1128 , encrypted data on a public ledger is problematic; and if we talk about peer DIDs that are only visible to parties of a relationship, then the assumption is that communication between those parties (and transfer of the DID Document) is encrypted anyway on the transport layer of the protocol used by this particular DID method. In other words, those DIDs are only "resolvable" by the parties of the relationship anyway.

@RieksJ in light of this, do you still see a need for talking about encryption in the DID spec?

RieksJ commented 5 years ago

I disagree with the generic statement that encryption is problematic on a public ledger, and as @dhh1128 says: it is better to encrypt such data than to leave it plaintext.

There is of course the risk that at some point in time crypto may be cracked, but there are also measures that can reduce such risks to an acceptable level. For this particular risk, one might think of renewing the keys in the DID document and a re-encryption with an algorithm that at that point has not yet been cracked, and I think it is likely that there are better measures. I do not want to reject the idea just because we see a problem that might occur in the future and for which we can come up with ways to address it.

I would much rather first discuss whether or not the benefits/use-cases that I described in my first post are relevant, and if we have a consensus there, we may proceed to find ways in which to bring them about (of which my suggestion is just the first idea). How's that?

tplooker commented 5 years ago

@RieksJ in regards to re-encrypting data on a public ledger, this would require universal support (across different public ledgers) for effective modification of history wouldn't it? If that's the case then doesnt it undermine one of the key value propositions of a DLT (immutability).

peacekeeper commented 5 years ago

I'm a bit worried that once we start talking about (partially) encrypting DID Documents, people may start to think it's okay to put (encrypted) personal data into a DID Document. I'm not saying that there are no use cases at all for this. But so far the general design has been that DID Documents contain only metadata needed for trusted interaction with the DID subject, and that exchange of personal data happens elsewhere (via agents, etc.).

RieksJ commented 5 years ago

I would much rather first discuss whether or not the benefits/use-cases that I described in my first post are relevant, and if we have a consensus there, we may proceed to find ways in which to bring them about (of which my suggestion is just the first idea). How's that?

@peacekeeper: Having said that: if people would want to put encrypted personal data in a DID document, they can to so already. Also, the actual use that people make of a capability to encrypt (part of) a DID Document is not a responsibility of the DID spec, in the same way that the use people make of knives is not the responsibility of knife-manufacturers, and yes, people get killed by knives.

kdenhartog commented 5 years ago

In my mind, the length of time becomes the important factor to consider here. If we're looking to achieve privacy for a short period of time (days to months) I think encryption is a sufficient method of privacy. However, if we're talking years or decades where we'd like to keep the data private, I don't believe only encryption will be sufficient.

I'd like to better understand the use case you're trying to achieve by encrypting and publishing this data to the ledger? Are you willing to share here?

RieksJ commented 5 years ago

Here's the use-case. I want an agent of mine and an agent of yours to be able to contact each other privately at arbitary points in time, and then set up a mutually authenticated, secure/encrypted communications channel. This is also what I understand the P2P DIDcomm protocols to support.

In the onboarding session, both our agents would generate and exchange DIDs, create corresponding DID documents, and store them in such a way that one can read the other's document and the other can read and update its own document.

If we require that each DID has a single document (i.e. we don't send updates of our DID document to each other), these documents have to be published in the public space, which means that anyone that gets hold of the corresponding DID (document) can read it.

It depends on what has been stored in the DID document whether or not that might be a (privacy or other) problem. Specifying the endpoint URLs (which include domain names) may reveal the identity of the owner of the document. Also, one may expect that (future) did methods will allow other kinds of data to be stored in DID documents that reveal such identities.

I am looking for ways to prevent such problems while not constraining possible contents of DID documents, nor imposing restrictions on storage locations. Does that make sense?

kdenhartog commented 5 years ago

From the sounds of it, you're looking for a long term solution because you're looking for the capability to persistently update the DID Doc over time. In this case, I'd defer to my previous comment " I don't believe only encryption will be sufficient".

I understand the desire to prevent constraining the information of the DID Docs, or their locations, but I'm not convinced that encryption is the only security mechanism needed to sustain privacy. In my mind, we're looking to use a defense in depth approach to handle this because we recognize there is no one solution that will solve every problem. This is why the implementations that I've seen using the peer method spec, rely on not only the data model, but also on encryption of the data. Given the nature of information entropy, I think data model restrictions is one of the most viable approaches to remaining in control of the information entropy as long as possible.

Let's take a look at this from an analogy to see if it makes sense. Let's say that we lived in a world where this same approach was used in the physical world. And in this world there's a family of 4, who've chosen to encrypt all 4 members phone numbers, email addresses, and website home pages and then paint that data on the side of the house for their neighbor. Now, any person can drive by and try to "crack" this data. For the most part the adversary won't have any luck. But, let's say the neighbor 1 got upset with this family and decide to share the key with the other neighbor 2. Now, every time the family decides to update their email and paint the newly encrypted data on the house, both neighbors get this information right away. However, if instead one of the family members decided to walk over and share the information directly with the neighbor 1, they would have to actively relay this information each time forcing another decision point for the information to reach the same level of entropy. It sounds like this extra decision point is your concern, but for me it's the advantage.

The main reason I bring up this ludicrous analogy is because I believe it forces us to consider if we'd do this with information in meatspace. Given this is not something that I believe most people would do in the physical world, I don't think we should aspire to do this in the digital world with DIDs where data is even less restricted to move freely and potentially harm the privacy of the subject.

RieksJ commented 5 years ago

You are right that for long term, defense-in-depth solutions, this is not sufficient. But as @dhh1128 already commented: "it is better to encrypt such data than to leave it plaintext". In a meatspace (I like the term) analogy: it is better to lock your airline luggage with a simple lock than having no lock at all.

In your analogy, you seem to acknowledge that for drive-by situations, even if they try, encryption makes sense. And that is why I think it should be considered.

Then there are other situations, such as when the relation between the family and neighbour1 goes sour. Or even easier ones, such as where neighbour1 does not need to share his private key (which would also compromise himself), but simply decides to share the plaintext. While I'm sure this might/would happen, it is not the kind of situation I have in mind to protect against.

kdenhartog commented 5 years ago

Given your point and the fact that I'm trying to remain open minded to solutions that are viable for others, I'm no longer opposed to this. I hope this won't get used improperly, but I think that's up to us to clearly articulate when this is acceptable and when it's not.

Particularly these are the 3 don'ts I want to make sure people understand:

  1. Don't use this method as a means to protect data which you don't plan to be revealed in a short period of time publicly.
  2. Don't use this as a mechanism to include PII in DID Documents
  3. Don't use public keys of entities that do not wish to be correlated to encrypt.
jandrieu commented 4 years ago

Closing as this has moved to the DIDWG did-spec repo.