Closed expede closed 2 years ago
Possible that it's just picking up on this with a naive scan of the source?
I've been confused by the encryption code before, because my info was that we're using AES-GCM, but we're instantiating keystore-idb in a way such that I think we're actually using AES-CTR:
Notice how on line 14 we're not providing a specific "algorithm" option here, so the default symmetric algorithm in keystore-idb is used. Which is AES-CTR
(used here).
Do we want to switch that algorithm out in WNFS V1? Because that'd be a breaking change, unless we'd add some compatibility layer.
We definitely haven't switched yet, issue on keystore-idb is still open: https://github.com/fission-suite/keystore-idb/issues/39
Reasons we haven't switched yet:
He also mentioned that here: https://github.com/fission-suite/keystore-idb/issues/16
Okay. So TLDR, we used AES-CTR to have a plan for streaming.
There's two more ways to support streaming that came out of the exploratory work with wnfs-go. It's always about how we chop up files into chunks that we can stream. These two other options came up:
Minor note: As far as I can tell all 3 approaches enable random access.
In terms of performance, using AES-CTR or doing it at the IPFS Chunker layer will most likely be more performant (no need to calculate namefilters).
In terms of privacy, chunking files into private filesystem blocks wins.
Also worth noting that the third approach is what we put in the spec so far (see the Content
data type on this page and the 'Content Access' section below).
Okay. So TLDR, we used AES-CTR to have a plan for streaming.
[Context: AES-GCM is AES-CTR plus a MAC]
@matheus23 we can actually use GCM for streaming, since we'd be breaking up the file into chunks that can be separately MACed. Given the size of a seekable chunk, the performance overhead won't be noticeable at all (it's an extra hash on a ~1MB chunk). The bigger question is if we get that from the MAC since we have CIDs. I think it should be fine, but we should talk to an expert. Until then AES-GCM is the common wisdom as being strictly better than AES-CTR.
Probably worth noting that we'd need to do an IV per chunk as well, so IV + MAC per chunk seem very reasonable to me. Thoughts?
Oh yeah right. Option 2 and 3 totally don't need AES-CTR. We can use AES-GCM with very little overhead.
So the way I understand this now is: The current plan is to go for Option 3 (file chunks under different namefilters) and because that's breaking & we're not adding a compatibility layer, we'll schedule that for WNFSv2.
The current plan is to go for Option 3 (file chunks under different namefilters) and because that's breaking & we're not adding a compatibility layer, we'll schedule that for WNFSv2.
For file streaming? Yes 💯 We should still move to AES-GCM as soon as possible for non-streaming files as well.
We should still move to AES-GCM as soon as possible for non-streaming files as well
Okay, so I interpret "as soon as possible" to be within wnfs v1.
So that means we need a compatibility layer. We need
This is a breaking WNFS update, which means all apps that stay on webnative version 28 or below will stop working once a user has used e.g. drive with webnative version 29 (or whatever will include the AES-GCM update).
I think this might be a bit drastic. Maybe I misunderstood and the intention was to go fully AES-GCM starting with WNFSv2?
@matheus23
Okay, so I interpret "as soon as possible" to be within wnfs v1.
Yup, unless v2 is very close, which I don't believe is the case
A way to annotate private filesystem blocks with the algorithm used so we know whether we have to decrypt using AES-CTR or AES-GCM
The algorithm is tracked in decryption pointers, not on the file itself. If there's no algorithm listed, treat it as AES-CTR. Otherwise use whatever is in the algorithm field (in this case AES-GCM, possibly XChaCha20-Poly1305 and others later).
This is a breaking WNFS update
This is a fully backwards-compatible change (it is not forwards compatible). Yes, apps will need to update to the latest webnative if their specific data is shared with apps that have a newer version (infrequent at best today). We do this kind of thing frequently. The semver would be a minor version increment.
@expede I assume that when the auth lobby grants private file/directory permissions, it should now both share the symmetric key
to decrypt the node, as well as the algorithm
, right? Otherwise webnative wouldn't know whether the file/directory is AES_CTR or AES_GCM encrypted.
@expede I assume that when the auth lobby grants private file/directory permissions, it should now both share the symmetric
key
to decrypt the node, as well as thealgorithm
, right? Otherwise webnative wouldn't know whether the file/directory is AES_CTR or AES_GCM encrypted.
I think so. We need to pass it to https://github.com/fission-suite/keystore-idb/blob/main/src/aes/keys.ts#L17 via https://github.com/fission-suite/webnative/blob/main/src/fs/v1/PrivateFile.ts#L86
@expede I assume that when the auth lobby grants private file/directory permissions, it should now both share the symmetric
key
to decrypt the node, as well as thealgorithm
, right? Otherwise webnative wouldn't know whether the file/directory is AES_CTR or AES_GCM encrypted.
Yes, exactly. Alternatively if we put the mode on the file itself (I think we have that in the Talk thread), that also solves the problem
Since putting the AES-GCM/AES-CTR distinction onto the block itself idea came up later, I've already started implementing adding an algorithm
key next to the key
in unlock pointers.
There's a problem with this approach though: UnlockPointer
s are not pointers to immutable data. They don't point to one private block, but possibly many blocks at different versions. Thus, if we refer to one private filesystem block that is AES-CTR
, then we need to encrypt all future revisions of this block with AES-CTR
. There's no way to figure out whether a block is AES-CTR
or AES-GCM
during seeking if we store that info only on links to these blocks.
This whole issue then bubbles up to the highest level: The filesystem entry pointer. In this model it, too, becomes an UnlockPointer
with a key
and algorithm
parameter. When we load existing filesystems and algorithm
doesn't exist yet, we need to assume it's AES-CTR
. And unless we throw away this pointer and construct an entirely new one, we can't change that to AES-GCM
due to the reason I listed above.
We could just enable AES-GCM
only for newly created directories, files and entirely new accounts.
I just wanted to write up what I've learned above. We had another idea we wanted to discuss anyways:
I'm in favor of switching to attaching this info to the blocks itself. It really seems like this is where this piece of information belongs.
There's some bike-shedding to do in terms of which format we use to attach this information to the blocks. Here are some options:
alg
. There it annotates the block with the algorithm used, encoded as a string. So either AES-GCM
or AES-CTR
or something else. We could move the initialisation vector (iv
) into a field in this CBOR, instead of prepending the encrypted block with it. (At least when we're using AES-GCM. It could not exist for other algorithms or be called something else e.g. counter
or ctr
for AES-CTR
)I like using CBOR
instead of something like JSON
, because it allows encoding binary data into it with O(1) size overhead instead of O(n) for JSON. (More specifically, if binary data is encoded as base64, it's a ~1.33x size overhead).
I'm not a fan of the first option because it seems like inventing a new encoding format in a hard to maintain way.
Instead, encoding as CBOR means the data is less order-dependent and more future-proof. (Maybe it makes sense to encode it as [{ alg: AES-GCM | AES-CTR, [iv: bytes], [ctr: bytes] }, <bytes>]
to ensure that the header comes before the cipher.)
As far as I understand, this can't be done backwards-compatibly without trying to just interpret private filesystem blocks as AES-CTR encrypted and once failing that, retrying with the new encoding.
All this feels like reinventing a spec. I would like to re-use dag-jose
and JWEs, although I'm not sure the spec is meant to be used for use cases such as ours. Most examples encrypt the encryption key using asymmetric encryption, thus only allow for 1-to-1 encryption. The most relevant example seems to be A.3. Example JWE Using AES Key Wrap and AES_128_CBC_HMAC_SHA_256
in the JWE spec.
I like using CBOR instead of something like JSON, because it allows encoding binary data into it with O(1) size overhead instead of O(n) for JSON.
Yeah, base64ing videos would be less than ideal
RE dag-jose: I was also thinking about this recently — I don't see why it wouldn't work with a cryptree once the algorithm is tagged & public. dag-jose
is more general than our use case, especially if we're tracking the algorithm on the data itself, so we'd be adding more structure on top of this. IIRC it also some a bunch of data that we don't care about like recipients
.
I guess one way to think about it is as if we have a bunch of singleton DAGs that we then placing into a namespace. Whether the key is used with one or more files won't matter too much, I think?
We could just enable AES-GCM only for newly created directories, files and entirely new accounts.
Yes, this ☝️ We don't need the encryption to be identical throughout the entire filesystem, and will likely have use cases for different methods for different data, new encryption algorithms, and so on. You're right that the security of the parent is the ceiling for the security of child nodes.
They don't point to one private block, but possibly many blocks at different versions. Thus, if we refer to one private filesystem block that is AES-CTR, then we need to encrypt all future revisions of this block with AES-CTR.
Yes, until we have key rotation. (Presumably rotation would also include the ability to change the encryption algorithm.) We want to avoid adding that to scope right now, if possible. We can continue those files with CTR for now, and key rotate them later if needed. This will only effect existing file systems, of which there are very few.
Not sure if this is helpful, but my understanding is that you can also decrypt an AES-GCM message with AES-CTR, though I've never actually tried. This ignores the MAC, which is certainly not ideal, but isn't any worse than what we're doing right now 😛
[{ alg: AES-GCM | AES-CTR, [iv: bytes], [ctr: bytes] },
] to ensure that the header comes before the cipher.)
Per JWE, I guess you'd also break out the MAC?
Actually, JWE has one critical difference: It splits the encryption into two keys:
The encrypted CEK is attached to the JWE's payload, which is encrypted with the CEK.
I think the idea would be that you'd use an asymmetric key to encrypt your CEK. In our case it complicates the whole situation a little: We'd have an addition symmetric key indirection in here.
I think for now it would make sense to use our own CBOR-based format that just lists the algorithm (and possibly the IV).
I think the idea would be that you'd use an asymmetric key to encrypt your CEK. In our case it complicates the whole situation a little: We'd have an addition symmetric key indirection in here.
Yeah, that doesn't make sense for our use case. We do key management elsewhere.
I think for now it would make sense to use our own CBOR-based format that just lists the algorithm (and possibly the IV).
Agreed on algorithm, IV, and payload 👍
I suppose we'd also break out the MAC (AKA "authentication tag"), so:
...but possibly different based on the algorithm & mode (e.g. CTR doesn't use a MAC)
Also also: I suppose CBOR because it's easy for us to standardize on and use everywhere, ya? This wrapper is quite small, but there's also Protobufs, Avro, Thrift, and more that have reasonable levels of support. Any opinions?
I like CBOR, because
Everything else would need us adding another dependency.
I suppose we'd also break out
For now I went with only two keys in the CBOR record: alg
for a string indicating the algorithm and cip
for the ciphertext, which then includes IV and MAC. (for AES-GCM).
Getting the iv out of there is easy (and something I planned to do, it's just a little more involved, because I'd have to crack open keystore-idb), but I'm not sure how to get the MAC out of there, since it's put there by WebCrypto. Possibly a standard and appended to the beginning/end?
Unless we have a specific reason to, we should be using AES-GCM across the board. Safari is giving me this warning:
There's some open questions about if we GCM is overkill when we have integrity hashes from IPFS, but it can't hurt in the meantime. We've discussed this internally previously, and I thought we had already switched over to GCM. We should double check.
Possible that it's just picking up on this with a naive scan of the source? https://github.com/fission-suite/webnative/blob/26dcae9e4a8f0145e62648984ae29dccb374a02b/src/lobby/blocklist.ts#L73-L77