pyca / cryptography

cryptography is a package designed to expose cryptographic primitives and recipes to Python developers.
https://cryptography.io
Other
6.58k stars 1.51k forks source link

`ChaCha20` API does not allow updating the `nonce` #10193

Closed jlaine closed 2 months ago

jlaine commented 8 months ago

Hello cryptographistas!

Currently aioquic's QUIC header protection and encryption features use C code linked against OpenSSL. I am considering porting this to pure Python using only cryptography. On the whole things work, and a work in progress is here:

https://github.com/aiortc/aioquic/pull/457

One thing I noticed is that the ChaCha20 class takes the nonce in its constructor. Unfortunately QUIC uses a nonce which is derived from each packet's header. This means that when using the CHACHA20_POLY1305_SHA256 cipher suite, for every packet we need to tear down / recreate the ChaCha20 instance which is not cheap.

Would you consider an API which allows updating the nonce? This can be implemented by calling:

EVP_CipherInit_ex(ctx, NULL, NULL, NULL, nonce, operation)

reaperhulk commented 8 months ago

What sort of performance do you see on current main? We actually had a context reuse system (fixedpool) for ChaCha20Poly1305 but we got rid of it in the latest transition because the rust conversion gave us 20% better performance even without reuse.

alex commented 8 months ago

Slight correction: we still reuse the ctx on openssl3.2.

On Wed, Jan 17, 2024, 10:32 AM Paul Kehrer @.***> wrote:

What sort of performance do you see on current main? We actually had a context reuse system (fixedpool) for ChaCha20Poly1305 but we got rid of it in the latest transition because the rust conversion gave us 20% better performance even without reuse.On Jan 16, 2024, at 5:10 PM, Jeremy Lainé @.***> wrote: Hello cryptographistas! Currently aioquic's QUIC header protection and encryption features use C code linked against OpenSSL. I am considering porting this to pure Python using only cryptography. On the whole things work, and a work in progress is here: aiortc/aioquic#457 One thing I noticed is that the ChaCha20 class takes the nonce in its constructor. Unfortunately QUIC uses a nonce which is derived from each packet's header. This means that when using the CHACHA20_POLY1305_SHA256 cipher suite, for every packet we need to tear down / recreate the ChaCha20 instance which is not cheap. Would you consider an API which allows updating the nonce? This can be implemented by calling: EVP_CipherInit_ex(ctx, NULL, NULL, NULL, nonce, operation)

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you are subscribed to this thread.Message ID: @.***>

— Reply to this email directly, view it on GitHub https://github.com/pyca/cryptography/issues/10193#issuecomment-1896056543, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAAGBHJ4VFJBMV4UQNLOGTYO7VJDAVCNFSM6AAAAABB5KGZZ6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOJWGA2TMNJUGM . You are receiving this because you are subscribed to this thread.Message ID: @.***>

jlaine commented 8 months ago

I tried putting together a very crude benchmark to get feel for how much it cost to create a new Cipher instance vs how much time actual encryption takes. Note this benchmark is somewhat artificial since it does not address the initial problem which is "how to cheaply update the nonce".

import time

from cryptography.hazmat.primitives.ciphers import Cipher, algorithms

ITERATIONS = 1000000
KEY = bytes(32)
NONCE = bytes(16)
DATA = bytes(64)

start = time.time()
for i in range(ITERATIONS):
    r = Cipher(
        algorithm=algorithms.ChaCha20(KEY, NONCE),
        mode=None,
    ).encryptor().update(DATA)
elapsed = time.time() - start
print(f"{ITERATIONS} iterations with teardown in {elapsed:.1f} s")

start = time.time()
encryptor = Cipher(
    algorithm=algorithms.ChaCha20(KEY, NONCE),
    mode=None,
).encryptor()
for i in range(ITERATIONS):
    encryptor.update(DATA)
elapsed = time.time() - start
print(f"{ITERATIONS} iterations without teardown in {elapsed:.1f} s")

I then ran these tests against both main and @alex's crypto-rust branch.

main - OpenSSL 3.0

1000000 iterations with teardown in 11.1 s 1000000 iterations without teardown in 1.5 s

crypto-rust - OpenSSL 3.0

1000000 iterations with teardown in 7.5 s 1000000 iterations without teardown in 1.5 s

AFAICT it tells us the crypto-rust does bring down the setup cost, but the cost is definitely still significant. I also tried against OpenSSL 3.2 but saw no meaningful changes.

reaperhulk commented 8 months ago

In your test example you're using the Cipher/ChaCha20 construction but in your draft PR you're using ChaCha20Poly1305 so I'm a little confused 😄

In cryptography 42.0.x we shifted ChaCha20Poly1305 to rust. As you pointed out, there's a separate branch moving Cipher to rust, but that's not shipped yet.

If we use this test script

import random
import time

from cryptography.hazmat.primitives.ciphers.aead import ChaCha20Poly1305

ITERATIONS = 1000000
KEY = bytes(32)
DATA = bytes(64)

start = time.time()
c = ChaCha20Poly1305(KEY)
for i in range(ITERATIONS):
    nonce = random.randbytes(12) # non-CSPRNG but just used to ensure there's always a new nonce
    c.encrypt(nonce, DATA, None)
elapsed = time.time() - start
print(f"{ITERATIONS} iterations with teardown in {elapsed:.1f} s")
And some performance: Version Time
42.0.1 2.2s
ChaCha20Reuseable 2.1s
#10266 1.6s

So with the latest changes we are now >20% faster than the old reuseable context hack (which is broken for 42.0.0+ anyway).

Is there an easy way for me to do aioquic perf testing?

reaperhulk commented 8 months ago

Looking a bit more closely I realize you're also using the traditional Cipher API later in the file. Well, I made ChaCha20Poly1305 much faster at least 😄. We would definitely need to come up with a new API for allowing nonce update on an instantiated CipherContext.

jlaine commented 8 months ago

Looking a bit more closely I realize you're also using the traditional Cipher API later in the file. Well, I made ChaCha20Poly1305 much faster at least 😄. We would definitely need to come up with a new API for allowing nonce update on an instantiated CipherContext.

QUIC indeed uses both AEAD for payload encryption (AES-GCM or ChaCha20Poly1305) and AES-ECB or ChaCha20 to mask the packet headers. I currently have a huge performance gap (about 10x) between my C implementation and the Python one, but its not all due to encryption, I suspect my bytearray manipulations are sub-optimal. I'd really like to get rid of the C code, both because of the inherent risks of a foot-gun and because it forces me to vendor another copy of OpenSSL,

I know there are performance gains coming thanks to Alex's PRs, but what struck me was the assumption in the Cryptography API that a nonce is set only once for ChaCha20, which does not line up with how QUIC uses it. Hence my ticket :)

reaperhulk commented 7 months ago

Some draft work on this in #10437

alex commented 2 months ago

resolved