Use AES-GCM everywhere - Githubissues

expede commented 3 years ago

Unless we have a specific reason to, we should be using AES-GCM across the board. Safari is giving me this warning:

There's some open questions about if we GCM is overkill when we have integrity hashes from IPFS, but it can't hurt in the meantime. We've discussed this internally previously, and I thought we had already switched over to GCM. We should double check.

Possible that it's just picking up on this with a naive scan of the source? https://github.com/fission-suite/webnative/blob/26dcae9e4a8f0145e62648984ae29dccb374a02b/src/lobby/blocklist.ts#L73-L77

matheus23 commented 3 years ago

Possible that it's just picking up on this with a naive scan of the source?

I've been confused by the encryption code before, because my info was that we're using AES-GCM, but we're instantiating keystore-idb in a way such that I think we're actually using AES-CTR:

https://github.com/fission-suite/webnative/blob/16af87446f68bf8d3c367a04abeb3f67e57e40a5/src/crypto/browser.ts#L11-L16

Notice how on line 14 we're not providing a specific "algorithm" option here, so the default symmetric algorithm in keystore-idb is used. Which is AES-CTR (used here).

matheus23 commented 3 years ago

Do we want to switch that algorithm out in WNFS V1? Because that'd be a breaking change, unless we'd add some compatibility layer.

icidasset commented 3 years ago

We definitely haven't switched yet, issue on keystore-idb is still open: https://github.com/fission-suite/keystore-idb/issues/39

Reasons we haven't switched yet:

Breaking change, need upgrade path for filesystem.
I think Daniel said something about streaming encryption/decryption and picking CTR for that reason.

icidasset commented 3 years ago

He also mentioned that here: https://github.com/fission-suite/keystore-idb/issues/16

matheus23 commented 3 years ago

Okay. So TLDR, we used AES-CTR to have a plan for streaming.

There's two more ways to support streaming that came out of the exploratory work with wnfs-go. It's always about how we chop up files into chunks that we can stream. These two other options came up:

Do the chunking at the IPFS chunker level (at the configured ipfs chunk size, which is like 1MB max I think). I.e. wnfs-go's Cypherchunker.
Chunk files into blocks which all get their own namefilter. We would do this to hide how big files in the user's filesystem are and thus make it harder to fingerprint users.

Minor note: As far as I can tell all 3 approaches enable random access.

In terms of performance, using AES-CTR or doing it at the IPFS Chunker layer will most likely be more performant (no need to calculate namefilters).

In terms of privacy, chunking files into private filesystem blocks wins.

Also worth noting that the third approach is what we put in the spec so far (see the Content data type on this page and the 'Content Access' section below).

expede commented 3 years ago

Okay. So TLDR, we used AES-CTR to have a plan for streaming.

[Context: AES-GCM is AES-CTR plus a MAC]

@matheus23 we can actually use GCM for streaming, since we'd be breaking up the file into chunks that can be separately MACed. Given the size of a seekable chunk, the performance overhead won't be noticeable at all (it's an extra hash on a ~1MB chunk). The bigger question is if we get that from the MAC since we have CIDs. I think it should be fine, but we should talk to an expert. Until then AES-GCM is the common wisdom as being strictly better than AES-CTR.

expede commented 3 years ago

Probably worth noting that we'd need to do an IV per chunk as well, so IV + MAC per chunk seem very reasonable to me. Thoughts?

matheus23 commented 3 years ago

Oh yeah right. Option 2 and 3 totally don't need AES-CTR. We can use AES-GCM with very little overhead.

matheus23 commented 3 years ago

So the way I understand this now is: The current plan is to go for Option 3 (file chunks under different namefilters) and because that's breaking & we're not adding a compatibility layer, we'll schedule that for WNFSv2.

expede commented 3 years ago

The current plan is to go for Option 3 (file chunks under different namefilters) and because that's breaking & we're not adding a compatibility layer, we'll schedule that for WNFSv2.

For file streaming? Yes 💯 We should still move to AES-GCM as soon as possible for non-streaming files as well.

matheus23 commented 3 years ago

We should still move to AES-GCM as soon as possible for non-streaming files as well

Okay, so I interpret "as soon as possible" to be within wnfs v1.

So that means we need a compatibility layer. We need

A way to annotate private filesystem blocks with the algorithm used so we know whether we have to decrypt using AES-CTR or AES-GCM
To encrypt new filesystem blocks with AES-GCM.

This is a breaking WNFS update, which means all apps that stay on webnative version 28 or below will stop working once a user has used e.g. drive with webnative version 29 (or whatever will include the AES-GCM update).

I think this might be a bit drastic. Maybe I misunderstood and the intention was to go fully AES-GCM starting with WNFSv2?

expede commented 3 years ago

@matheus23

Okay, so I interpret "as soon as possible" to be within wnfs v1.

Yup, unless v2 is very close, which I don't believe is the case

A way to annotate private filesystem blocks with the algorithm used so we know whether we have to decrypt using AES-CTR or AES-GCM

The algorithm is tracked in decryption pointers, not on the file itself. If there's no algorithm listed, treat it as AES-CTR. Otherwise use whatever is in the algorithm field (in this case AES-GCM, possibly XChaCha20-Poly1305 and others later).

This is a breaking WNFS update

This is a fully backwards-compatible change (it is not forwards compatible). Yes, apps will need to update to the latest webnative if their specific data is shared with apps that have a newer version (infrequent at best today). We do this kind of thing frequently. The semver would be a minor version increment.

matheus23 commented 3 years ago

@expede I assume that when the auth lobby grants private file/directory permissions, it should now both share the symmetric key to decrypt the node, as well as the algorithm, right? Otherwise webnative wouldn't know whether the file/directory is AES_CTR or AES_GCM encrypted.

icidasset commented 3 years ago

@expede I assume that when the auth lobby grants private file/directory permissions, it should now both share the symmetric key to decrypt the node, as well as the algorithm, right? Otherwise webnative wouldn't know whether the file/directory is AES_CTR or AES_GCM encrypted.

I think so. We need to pass it to https://github.com/fission-suite/keystore-idb/blob/main/src/aes/keys.ts#L17 via https://github.com/fission-suite/webnative/blob/main/src/fs/v1/PrivateFile.ts#L86

expede commented 3 years ago

@expede I assume that when the auth lobby grants private file/directory permissions, it should now both share the symmetric key to decrypt the node, as well as the algorithm, right? Otherwise webnative wouldn't know whether the file/directory is AES_CTR or AES_GCM encrypted.

Yes, exactly. Alternatively if we put the mode on the file itself (I think we have that in the Talk thread), that also solves the problem

matheus23 commented 3 years ago

Since putting the AES-GCM/AES-CTR distinction onto the block itself idea came up later, I've already started implementing adding an algorithm key next to the key in unlock pointers.

There's a problem with this approach though: UnlockPointers are not pointers to immutable data. They don't point to one private block, but possibly many blocks at different versions. Thus, if we refer to one private filesystem block that is AES-CTR, then we need to encrypt all future revisions of this block with AES-CTR. There's no way to figure out whether a block is AES-CTR or AES-GCM during seeking if we store that info only on links to these blocks.

This whole issue then bubbles up to the highest level: The filesystem entry pointer. In this model it, too, becomes an UnlockPointer with a key and algorithm parameter. When we load existing filesystems and algorithm doesn't exist yet, we need to assume it's AES-CTR. And unless we throw away this pointer and construct an entirely new one, we can't change that to AES-GCM due to the reason I listed above.

We could just enable AES-GCM only for newly created directories, files and entirely new accounts.

I just wanted to write up what I've learned above. We had another idea we wanted to discuss anyways:

I'm in favor of switching to attaching this info to the blocks itself. It really seems like this is where this piece of information belongs.

sidenote

We should probably consider attaching the `CryptoAlgorithm` to the filesystem blocks itself as well. This makes it easier to switch to a better/more powerful algorithm in the future. Maybe there are more problems with this. I think we should spend some cycles again thinking about this.

There's some bike-shedding to do in terms of which format we use to attach this information to the blocks. Here are some options:

We prepend one byte to the encrypted blocks. If this byte is 0x00 it's AES-GCM. Other byte values are reserved for other ways to encrypt.
Each block is encoded as CBOR. This cbor has at least one field, its name is alg. There it annotates the block with the algorithm used, encoded as a string. So either AES-GCM or AES-CTR or something else. We could move the initialisation vector (iv) into a field in this CBOR, instead of prepending the encrypted block with it. (At least when we're using AES-GCM. It could not exist for other algorithms or be called something else e.g. counter or ctr for AES-CTR)

I like using CBOR instead of something like JSON, because it allows encoding binary data into it with O(1) size overhead instead of O(n) for JSON. (More specifically, if binary data is encoded as base64, it's a ~1.33x size overhead).

I'm not a fan of the first option because it seems like inventing a new encoding format in a hard to maintain way. Instead, encoding as CBOR means the data is less order-dependent and more future-proof. (Maybe it makes sense to encode it as [{ alg: AES-GCM | AES-CTR, [iv: bytes], [ctr: bytes] }, <bytes>] to ensure that the header comes before the cipher.)

As far as I understand, this can't be done backwards-compatibly without trying to just interpret private filesystem blocks as AES-CTR encrypted and once failing that, retrying with the new encoding.

All this feels like reinventing a spec. I would like to re-use dag-jose and JWEs, although I'm not sure the spec is meant to be used for use cases such as ours. Most examples encrypt the encryption key using asymmetric encryption, thus only allow for 1-to-1 encryption. The most relevant example seems to be A.3. Example JWE Using AES Key Wrap and AES_128_CBC_HMAC_SHA_256 in the JWE spec.

expede commented 3 years ago

I like using CBOR instead of something like JSON, because it allows encoding binary data into it with O(1) size overhead instead of O(n) for JSON.

Yeah, base64ing videos would be less than ideal

RE dag-jose: I was also thinking about this recently — I don't see why it wouldn't work with a cryptree once the algorithm is tagged & public. dag-jose is more general than our use case, especially if we're tracking the algorithm on the data itself, so we'd be adding more structure on top of this. IIRC it also some a bunch of data that we don't care about like recipients.

I guess one way to think about it is as if we have a bunch of singleton DAGs that we then placing into a namespace. Whether the key is used with one or more files won't matter too much, I think?

We could just enable AES-GCM only for newly created directories, files and entirely new accounts.

Yes, this ☝️ We don't need the encryption to be identical throughout the entire filesystem, and will likely have use cases for different methods for different data, new encryption algorithms, and so on. You're right that the security of the parent is the ceiling for the security of child nodes.

They don't point to one private block, but possibly many blocks at different versions. Thus, if we refer to one private filesystem block that is AES-CTR, then we need to encrypt all future revisions of this block with AES-CTR.

Yes, until we have key rotation. (Presumably rotation would also include the ability to change the encryption algorithm.) We want to avoid adding that to scope right now, if possible. We can continue those files with CTR for now, and key rotate them later if needed. This will only effect existing file systems, of which there are very few.

Not sure if this is helpful, but my understanding is that you can also decrypt an AES-GCM message with AES-CTR, though I've never actually tried. This ignores the MAC, which is certainly not ideal, but isn't any worse than what we're doing right now 😛

[{ alg: AES-GCM | AES-CTR, [iv: bytes], [ctr: bytes] }, ] to ensure that the header comes before the cipher.)

Per JWE, I guess you'd also break out the MAC?

matheus23 commented 3 years ago

Actually, JWE has one critical difference: It splits the encryption into two keys:

a symmetric "Content Encryption Key" (CEK).
an asymmetric or symmetric encryption key which is used to encrypt the CEK.

The encrypted CEK is attached to the JWE's payload, which is encrypted with the CEK.

I think the idea would be that you'd use an asymmetric key to encrypt your CEK. In our case it complicates the whole situation a little: We'd have an addition symmetric key indirection in here.

I think for now it would make sense to use our own CBOR-based format that just lists the algorithm (and possibly the IV).

expede commented 3 years ago

I think the idea would be that you'd use an asymmetric key to encrypt your CEK. In our case it complicates the whole situation a little: We'd have an addition symmetric key indirection in here.

Yeah, that doesn't make sense for our use case. We do key management elsewhere.

I think for now it would make sense to use our own CBOR-based format that just lists the algorithm (and possibly the IV).

Agreed on algorithm, IV, and payload 👍

expede commented 3 years ago

I suppose we'd also break out the MAC (AKA "authentication tag"), so:

Actual encrypted bits
Algorithm & mode
IV
MAC

...but possibly different based on the algorithm & mode (e.g. CTR doesn't use a MAC)

expede commented 3 years ago

Also also: I suppose CBOR because it's easy for us to standardize on and use everywhere, ya? This wrapper is quite small, but there's also Protobufs, Avro, Thrift, and more that have reasonable levels of support. Any opinions?

matheus23 commented 3 years ago

I like CBOR, because

We already use it (for the private filesystem)
It has good support in the ipfs ecosystem in general (e.g. things like encoding CIDs in CBOR out of the box & a library maintained by PL)

Everything else would need us adding another dependency.

I suppose we'd also break out

For now I went with only two keys in the CBOR record: alg for a string indicating the algorithm and cip for the ciphertext, which then includes IV and MAC. (for AES-GCM).

Getting the iv out of there is easy (and something I planned to do, it's just a little more involved, because I'd have to crack open keystore-idb), but I'm not sure how to get the MAC out of there, since it's put there by WebCrypto. Possibly a standard and appended to the beginning/end?

oddsdk / ts-odd

Use AES-GCM everywhere #289