Closed michwill closed 8 years ago
Does this method (init then update nonce) need to modify pycrptodome? In init of pycrptodome's GCMMode _j0 depends on nonce, _cipher and _tag_cipher depend on _j0. If nonce changes, these attributes need to be updated.
True.
The question is where the bottleneck is - is it actually overhead of calling C implementations of the block cipher, or is it actual Python overhead. Chances are it's the latter because PyPy calling the same C implementations over cffi is much faster.
Options pretty much are:
Way to test the performance. Install PyCryptodome in virtualenv (or use zerodb's virtualenv). Then in ipython terminal:
from Crypto.Random import get_random_bytes
from Crypto.Cipher import AES
key = get_random_bytes(32)
def encrypt(txt):
nonce = get_random_bytes(16)
cipher = AES.new(key, AES.MODE_GCM, nonce)
cipher, tag = cipher.encrypt_and_digest(txt)
return nonce, cipher, tag
def decrypt(nonce, ciphertext, tag):
cipher = AES.new(key, AES.MODE_GCM, nonce)
return cipher.decrypt_and_verify(ciphertext, tag)
e = encrypt('x' * 1000) # This is 1 KB at once
>> %timeit decrypt(*e)
1000 loops, best of 3: 236 µs per loop
# this turns out to be 1/236e-6 / 1000 = 4.2 MB/s, that is *very* slow!
e = encrypt('x' * 10000000) # Now 10 MB at once
>> %timeit decrypt(*e)
100 loops, best of 3: 173 ms per loop
# this is 1/0.173 * 10 = 57.8 MB/s, much faster, wow!
Ah, also should mention that nonce is used much more efficiently if it is 12 bytes long (less Python code to execute in that case!). But it is not the main bottleneck here
Btw NaCl seems to have good performance properties (initialization takes 8us even with Python overhead, and same performance with big objects) (and David Bernstein cares about which authenticated mode to choose over there :-) There are also enough people who like NaCl (that's something very trustworthy), so makes sense to try...
Current state of pynacl seems to use libsodium 1.0.1 version which does not support aes256gcm. The libsodium doc says it supports aes256gcm in 1.0.4. But it should not be hard to upgrade pynacl to use v1.0.4 libsodium. I am interested in trying to test pynacl
I wrote a python C wrapper for crypto++ 's aes gcm in my fork of pycryptopp. The aesgcm C module is a wrapper of crypto++'s gcm. I tested it compared with the current pycryptodome. The test shows that it is much faster than pycryptodome's aes gcm.
from Crypto.Random import get_random_bytes
from Crypto.Cipher import AES
from pycryptopp.cipher.aesgcm import AESGCM
key = get_random_bytes(32)
def encrypt(txt):
nonce = get_random_bytes(16)
cipher = AES.new(key, AES.MODE_GCM, nonce)
cipher, tag = cipher.encrypt_and_digest(txt)
return nonce, cipher, tag
def decrypt(nonce, ciphertext, tag):
cipher = AES.new(key, AES.MODE_GCM, nonce)
return cipher.decrypt_and_verify(ciphertext, tag)
def gcmdecrypt(nonce, ciphertext, tag):
cipher = AESGCM(key=key,iv=nonce)
return cipher.decrypt_and_verify(ciphertext,tag)
e = encrypt('x' * 1000) # This is 1 KB at once
In [9]: %timeit decrypt(*e)
1000 loops, best of 3: 264 µs per loop
In [10]: %timeit gcmdecrypt(*e)
100000 loops, best of 3: 13.8 µs per loop
In [12]: e2 = encrypt('x' * 10000000) # Now 10 MB at once
In [13]: %timeit gcmdecrypt(*e2)
10 loops, best of 3: 89.7 ms per loop
In [14]: %timeit decrypt(*e2)
10 loops, best of 3: 179 ms per loop
Some problems that I found for this C wrapper module are:
Huh, that's very interesting!
If you want pypy compatibility, you can try cffi as a C interface. Though, don't know how fast is it
As libsodium's doc says:
The current implementation of this construction is hardware-accelerated and requires the Intel SSSE3 extensions, as well as the aesni and pclmul instructions.
Intel Westmere processors (introduced in 2010) and newer meet the requirements.
There are no plans to support non hardware-accelerated implementations of AES-GCM. If portability is a concern, use ChaCha20-Poly1305 instead.
If zerodb neeed to be compatible with non-aesni and non hardware-accelerated, pynacl itself maybe not cover all situations.
I add aes256gcm decrypt in the fork of pynal and compare it with pycryptopp and pycryodome. The comparison is as follows: (pynacl seems to support only 12 bytes nonce)
from Crypto.Random import get_random_bytes
from Crypto.Cipher import AES
from pycryptopp.cipher.aesgcm import AESGCM as pycryptopp_AES256GCM
from nacl.aead import AES256GCM as nacl_AES256GCM
key = get_random_bytes(32)
def encrypt(txt):
nonce = get_random_bytes(12) # 12 bytes nonce
cipher = AES.new(key, AES.MODE_GCM, nonce)
cipher, tag = cipher.encrypt_and_digest(txt)
return nonce, cipher, tag
def decrypt(nonce, ciphertext, tag):
cipher = AES.new(key, AES.MODE_GCM, nonce)
return cipher.decrypt_and_verify(ciphertext, tag)
def pycryptopp_gcmdecrypt(nonce, ciphertext, tag):
cipher = pycryptopp_AES256GCM(key=key,iv=nonce)
return cipher.decrypt_and_verify(ciphertext,tag)
def nacl_gcmdecrypt(nonce, ciphertext, tag):
cipher = nacl_AES256GCM(key=key)
return str(cipher.decrypt_and_verify(ciphertext,tag,nonce))
e = encrypt('x' * 1000) # This is 1 KB at once
In [13]: %timeit decrypt(*e)
The slowest run took 4.60 times longer than the fastest. This could mean that an intermediate result is being cached
1000 loops, best of 3: 223 µs per loop
In [12]: %timeit pycryptopp_gcmdecrypt(*e)
100000 loops, best of 3: 12.9 µs per loop
In [11]: %timeit nacl_gcmdecrypt(*e)
The slowest run took 12.86 times longer than the fastest. This could mean that an intermediate result is being cached
100000 loops, best of 3: 5.28 µs per loop
pynacl seems to be the fastest , but the limitations are 1) only supports aesni and hardware-accelated. 2) only supports 12 bytes nonce
Wow, seems as fast as it can get! The time is probably dominated by the actual decryption!
Actually I don't mind 12 bytes nonce because pycryptodome converts everything to this number of bits anyway (and larger nonce doesn't add security). I was thinking about switching to this nonce size anyway
Perhaps, can leave pycryptodome as a fallback.
I like that you've tested decryption of the same ciphertext with all libraries, means they're compatible :+1:
Do you think your changes can be accepted to pynacl? Otherwise, they seem to be portable enough to keep separately since you didn't patch any of existing functions
P.S. on a sidenote: how can contact you outside github? My email is michael [at] zerodb.io. Would be nice to have a call
In pycryptodome, AES256 is accelerated with AES-NI.
However, cipher initialization happens in Python. It is not very fast: 250 microseconds per object. Simple way is to switch to pypy which we support (that makes it 50 microseconds per object).
However, more correct way could be, perhaps, to init whatever possible without the nonce, and being able to update the nonce afterwards (since many encrypted objects have the same key but different nonces).