pyca / cryptography

cryptography is a package designed to expose cryptographic primitives and recipes to Python developers.
https://cryptography.io
Other
6.68k stars 1.54k forks source link

Our GCM API exposes un-authenticated plaintext #1199

Closed alex closed 7 years ago

alex commented 10 years ago

That's like... bad. We should probably come up with an AEAD API. This also times into #1141, since arguably streaming makes no sense for authenticated stuff.

https://www.imperialviolet.org/2014/06/27/streamingencryption.html

money quote: """I will even claim that the existance of an API that can operate in a streaming fashion over large records (i.e. will encrypt and defer the authenticator and will decrypt and return unauthenticated plaintext) is a mistake."""

We'd need to design a new API, and deprecate the existing one.

reaperhulk commented 10 years ago

It's impossible to remove the existing API due to interop requirements. The fix for this is at the recipe layer, not the hazmat (unfortunately). Incidentally, Fernet has this same problem. We had some very casual discussions about an AEAD recipe that used records, but they trailed off without any further action.

reaperhulk commented 10 years ago

Err, Fernet isn't a streaming API so the single MAC doesn't matter and it isn't vulnerable (although it precludes encrypting large files with it...the very thing that a hypothetical record-based recipe would be good for).

alex commented 10 years ago

As AGL says in his post, sadly he's unaware of any construction for taking an AEAD and turning it into a streaming record format. That means such a recipe needs to either be something end-to-end (TLS), or we have to invent something?

On Sun, Jun 29, 2014 at 7:39 PM, Paul Kehrer notifications@github.com wrote:

Err, Fernet isn't a streaming API so the single MAC doesn't matter and it isn't vulnerable (although it precludes encrypting large files with it...the very thing that a hypothetical record-based recipe would be good for).

— Reply to this email directly or view it on GitHub https://github.com/pyca/cryptography/issues/1199#issuecomment-47490236.

"I disapprove of what you say, but I will defend to the death your right to say it." -- Evelyn Beatrice Hall (summarizing Voltaire) "The people's good is the highest law." -- Cicero GPG Key fingerprint: 125F 5C67 DFE9 4084

reaperhulk commented 10 years ago

Probably so. If we choose to go down the route of inventing something then we'll need to write up a spec proposal and spend a lot of time getting as much feedback from cryptographers as we possibly can... Interested in @lvh's thoughts here.

alex commented 10 years ago

I'm not sure I agree, a single-shot API doesn't make anything impossible, it just makes it inefficient (would require the user to buffer everything), but that's not necessarily a bad thing.

lvh commented 10 years ago

Is the invention we're talking about here an API, or a construction? This ticket originally appeared to be about the GCM API being bad, but then it became about a streaming record format?

reaperhulk commented 10 years ago

Yeah, that last sentence sums it up pretty well. The GCM API is bad because it has to be bad to support streaming encryption. GCM is, of course, not a great thing to use for large files since you can't use any of the data until the last of the data is decrypted, but it's unacceptable to buffer it inside cryptography. Since we can't change that the question becomes how do you make a recipe that obviates the need for GCM streaming.

Actual streaming of encrypted data over a network is handled well by TLS, but it seems like there's also a need for an at streaming authenticated encryption format for storing data.

public commented 10 years ago

I think that returning unauthenticated plaintext is absolutely fine in hazmat provided that finalize() blows up when it detects the message doesn't pass the MAC.

Why do we think there's a need for a GCM based alternative to Fernet or some fancy record based multipart thing?

reaperhulk commented 10 years ago

There's a "need" because the current situation is not ideal. Specifically, if I want to decrypt an 80GB file I technically don't know if I can use a single byte of it until the very last byte decrypts. Fernet handles this case by being a one shot API that buffers everything; not an option for huge files. I don't believe a solution needs to use GCM, it's just coincidental that we started talking about this in an issue filed regarding GCM.

This isn't a critical issue, but that doesn't mean it's not worth talking about how it could be improved.

dstufft commented 10 years ago

I also think that getting access to the plaintext layer in hazmat prior to the MAC being verified is perfectly fine. The point of the hazmat distinction is that these APIs may have sharp edges that you can cut yourself on because otherwise we couldn't enable certain use cases, such as encrypting something that can't fit into memory.

I agree that the proper place to solve this is at the recipe layer. How exactly we achieve that I don't know. It could be a framed thing that yields frames instead of partially encrypted/decypted sections, or it could be something that presents two APIs one for in memory which just yields the entire thing and one for files which writes to a temporary file as a buffer and then moves it to the final location after processing has finished on it and the MAC has been verified.

simo5 commented 9 years ago

I'm curious, why not simply use a chunked Fernet like construction where the MAC/digest of the previous chunk is used as input with the next chunk (to chain together all the parts) ? AEAD algorithms can handle this case very well. Chunk Size can be defined in a (signed) header at the top of the file/stream, the header is he only special chunk and it is chained into the first chunk by its digest as well. The key may be re-derived for each chunk or not.

reaperhulk commented 9 years ago

I have a nebulous plan to extract the authenticated encryption construction in Tahoe-LAFS and adapt it into a recipe (CTR + a merkle tree to be able to rapidly auth arbitrary bytes and make it seekable), but I haven't had a chance to write that up.

Fernet isn't a great candidate for a chunked/streaming API since it isn't seekable due to the use of CBC.

simo5 commented 9 years ago

Do you want seekability in order to allow access to stored files and not just have a streaming format ?

reaperhulk commented 7 years ago

I'm going to be comfortable closing this issue once we have an AESGCM AEAD class since we can link in the docs to have users use the one shot APIs by preference and only fall back to the nasty streaming API if absolutely necessary.

alex commented 7 years ago

We now have an AES-GCM AEAD, so declaring this closed. Maybe some day we can think about deprecating the streaming API.