ipfs / notes

IPFS Collaborative Notebook for Research
MIT License
401 stars 30 forks source link

Convergent Encryption #63

Open prusnak opened 9 years ago

prusnak commented 9 years ago

A node operator can currently read the contents of the blocks stored on their node, because they are not encrypted. They cannot recover the file unless they have its IPFS hash, but still blocks might reveal some sensitive data.

One approach to deal with this situation in a way that is a perfect match for IPFS is Convergent Encryption.

It would work like this:

1) compute the IPFS hash of the plaintext file - let's call it H_p (plaintext hash) 2) encrypt the plaintext file with AES, using H_p (or KDF(H_p) such as scrypt) as the encryption key 3) add the resulting ciphertext file into IPFS, this will produce hash H_c (ciphertext hash)

To read the file contents one would need to have both H_c (to retreive the ciphertext file from IPFS) and H_p (to decrypt the ciphertext file and to confirm that the resulting file is indeed the wanted one).

Because the encryption key depends on the plaintext file and is generated in a deterministic manner, this solution has a nice property that it allows the block-level deduplication of encrypted files as well.

I am posting this idea here, because I am not sure if this might be interesting to implement into IPFS library directly, or this should belong to the application level built on top of IPFS.

eminence commented 9 years ago

So let's say Alice has data she wants to encrypt. She has plaintext P1, and produces H_p, H_c, and the ciphertext C1 as per the approach you describe. She wants to make P1 data available to Bob, but wants to ensure that all nodes (including her own node, Bob's node, and any other intermediate nodes) never see P1, they only see H_c and C1.

Without this encryption scheme, Alice would normally communicate H_p to Bob. But in this scenario, she communicates H_c (which is used to get the raw data out of IPFS), and also H_p (which must be treated as a secret). Bob uses H_p to recover P1 from C1, but since H_p is a secret, no one else can do this.

Do I have this right?

prusnak commented 9 years ago

@eminence Yes, you do have this right.

prusnak commented 9 years ago

Created a simple PoC here: https://github.com/prusnak/ipfs-ce

Plaintext hash is used directly. Real implementation should probably use a key derivation function.

jbenet commented 9 years ago

"Object Level Crypto" (encryption and authentication/signing) has been a long time coming. we haven't gotten around to this because it depends on "keystore" (which depends on "repo@dev0.4.0" which depends on "merging dev0.4.0").

Convergent encryption is definitely a mode that should be supported, but not the default for everything, because it leaks what data you have to nodes who also have the data (if i have plaintext X i can know who else has plaintext X).

In general, doing encryption right requires doing key management right.

I think we should be collaborating with the Tahoe-LAFS folks on all this as they got all of this right many years ago.

frankbraun commented 9 years ago

It's good to hear that "object level crypto" is planned for IPFS. I think object level encryption should be the default for a distributed file system like IPFS.

I agree with @jbenet's comment about the Tahoe-LAFS folks, they figured this out a long time ago.

As a reference: https://tahoe-lafs.org/hacktahoelafs/drew_perttula.html describes the two problems with the simple approach to convergent encryption. But the proposed solution of using an added_secret is good and very easy to implement.