Closed alex closed 7 years ago
It's impossible to remove the existing API due to interop requirements. The fix for this is at the recipe layer, not the hazmat (unfortunately). Incidentally, Fernet has this same problem. We had some very casual discussions about an AEAD recipe that used records, but they trailed off without any further action.
Err, Fernet isn't a streaming API so the single MAC doesn't matter and it isn't vulnerable (although it precludes encrypting large files with it...the very thing that a hypothetical record-based recipe would be good for).
As AGL says in his post, sadly he's unaware of any construction for taking an AEAD and turning it into a streaming record format. That means such a recipe needs to either be something end-to-end (TLS), or we have to invent something?
On Sun, Jun 29, 2014 at 7:39 PM, Paul Kehrer notifications@github.com wrote:
Err, Fernet isn't a streaming API so the single MAC doesn't matter and it isn't vulnerable (although it precludes encrypting large files with it...the very thing that a hypothetical record-based recipe would be good for).
— Reply to this email directly or view it on GitHub https://github.com/pyca/cryptography/issues/1199#issuecomment-47490236.
"I disapprove of what you say, but I will defend to the death your right to say it." -- Evelyn Beatrice Hall (summarizing Voltaire) "The people's good is the highest law." -- Cicero GPG Key fingerprint: 125F 5C67 DFE9 4084
Probably so. If we choose to go down the route of inventing something then we'll need to write up a spec proposal and spend a lot of time getting as much feedback from cryptographers as we possibly can... Interested in @lvh's thoughts here.
I'm not sure I agree, a single-shot API doesn't make anything impossible, it just makes it inefficient (would require the user to buffer everything), but that's not necessarily a bad thing.
Is the invention we're talking about here an API, or a construction? This ticket originally appeared to be about the GCM API being bad, but then it became about a streaming record format?
Yeah, that last sentence sums it up pretty well. The GCM API is bad because it has to be bad to support streaming encryption. GCM is, of course, not a great thing to use for large files since you can't use any of the data until the last of the data is decrypted, but it's unacceptable to buffer it inside cryptography. Since we can't change that the question becomes how do you make a recipe that obviates the need for GCM streaming.
Actual streaming of encrypted data over a network is handled well by TLS, but it seems like there's also a need for an at streaming authenticated encryption format for storing data.
I think that returning unauthenticated plaintext is absolutely fine in hazmat provided that finalize()
blows up when it detects the message doesn't pass the MAC.
Why do we think there's a need for a GCM based alternative to Fernet or some fancy record based multipart thing?
There's a "need" because the current situation is not ideal. Specifically, if I want to decrypt an 80GB file I technically don't know if I can use a single byte of it until the very last byte decrypts. Fernet handles this case by being a one shot API that buffers everything; not an option for huge files. I don't believe a solution needs to use GCM, it's just coincidental that we started talking about this in an issue filed regarding GCM.
This isn't a critical issue, but that doesn't mean it's not worth talking about how it could be improved.
I also think that getting access to the plaintext layer in hazmat prior to the MAC being verified is perfectly fine. The point of the hazmat distinction is that these APIs may have sharp edges that you can cut yourself on because otherwise we couldn't enable certain use cases, such as encrypting something that can't fit into memory.
I agree that the proper place to solve this is at the recipe layer. How exactly we achieve that I don't know. It could be a framed thing that yields frames instead of partially encrypted/decypted sections, or it could be something that presents two APIs one for in memory which just yields the entire thing and one for files which writes to a temporary file as a buffer and then moves it to the final location after processing has finished on it and the MAC has been verified.
I'm curious, why not simply use a chunked Fernet like construction where the MAC/digest of the previous chunk is used as input with the next chunk (to chain together all the parts) ? AEAD algorithms can handle this case very well. Chunk Size can be defined in a (signed) header at the top of the file/stream, the header is he only special chunk and it is chained into the first chunk by its digest as well. The key may be re-derived for each chunk or not.
I have a nebulous plan to extract the authenticated encryption construction in Tahoe-LAFS and adapt it into a recipe (CTR + a merkle tree to be able to rapidly auth arbitrary bytes and make it seekable), but I haven't had a chance to write that up.
Fernet isn't a great candidate for a chunked/streaming API since it isn't seekable due to the use of CBC.
Do you want seekability in order to allow access to stored files and not just have a streaming format ?
I'm going to be comfortable closing this issue once we have an AESGCM AEAD class since we can link in the docs to have users use the one shot APIs by preference and only fall back to the nasty streaming API if absolutely necessary.
We now have an AES-GCM AEAD, so declaring this closed. Maybe some day we can think about deprecating the streaming API.
That's like... bad. We should probably come up with an AEAD API. This also times into #1141, since arguably streaming makes no sense for authenticated stuff.
https://www.imperialviolet.org/2014/06/27/streamingencryption.html
money quote: """I will even claim that the existance of an API that can operate in a streaming fashion over large records (i.e. will encrypt and defer the authenticator and will decrypt and return unauthenticated plaintext) is a mistake."""
We'd need to design a new API, and deprecate the existing one.