jborg / attic

Deduplicating backup program
Other
1.11k stars 104 forks source link

aes-gcm or aes-ocb? #211

Open ThomasWaldmann opened 9 years ago

ThomasWaldmann commented 9 years ago

just to keep an idea and get feedback on it:

these are single-pass authenticated encryption ciphers. aes-gcm is even supported by special cpu instructions.

by using them, we maybe could improve cpu load and performance compared to the 2-pass method aes-ctr + hmac-sha256 that is used now.

dnnr commented 9 years ago

In contrast to the proposed selectable compression algorithms (#207), users currently can choose to not use encryption at all. So for situations where CPU cycles are so scarce that encryption isn't feasible, there's already at least some kind of workaround available.

That means, since the "only" advantage of switching to another cipher is a potential performance increase over aes-ctr, we might want to look a some benchmarks first to see if this would actually impact the overall performance by any significant measure. If it's just a small improvement, it probably won't help anyone who is currently forced to disable encryption altogether, so I don't think that the added implementation complexity (backwards compatibility!) will be worth it in the end.

ThomasWaldmann commented 9 years ago

Just some numbers I got from a measurement of just the crypto code (not attic as a whole):

1GB input data (plaintext), core i5-4200 cpu with aes support in hw. aes256-gcm (via openssl wrapper): 2.0s aes256-ctr (openssl) + hmac-sha256 (py stdlib): 6.6s

I did some full backup on the test machine and it took 270s for 6GB (from SSD to SSD). So the speedup of attic could (in theory) be about 10% on my machine (just for encrypt(), more if id_hash() is also accelerated using gmac). Of course the overall speed of an attic backup operation depends on a lot of factors.

About "added implementation complexity":

I added flexibility for that kind of stuff already in another PR (there it is for compression and maccing, but that can be adapted), so I am assuming that this is given. Then, the only difference might be whether the mac gets computed by HMAC from python stdlib (in a 2nd pass) or by aes-gcm from openssl wrapper (in 1 pass, together with encryption).

ThomasWaldmann commented 9 years ago

See PR #219.

ThomasWaldmann commented 9 years ago

real-world performance (I backupped some directory from SSD to SSD, both with hot fs cache):

aes256-ctr + hmac-sha256

Duration: 3 minutes 53.82 seconds Number of files: 1461 4.97 GB 4.41 GB 3.79 GB

aes256-gcm + gmac

Duration: 3 minutes 18.69 seconds Number of files: 1461 4.97 GB 4.41 GB 3.79 GB

18% faster \o/

ThomasWaldmann commented 9 years ago

note: ocb mode shall be added in openssl 1.1.0, which is not released yet.

ThomasWaldmann commented 9 years ago

from wikipedia:

http://en.wikipedia.org/wiki/OCB_mode#Attacks -> 64GB limitation per key :(

azet commented 8 years ago

@ThomasWaldmann

Does attic actually encrypt data of this size under a single key? This is also an issue with other block cipher modes.

ThomasWaldmann commented 8 years ago

there is only one static key per repo, which is used for all backups. there is no "backup session" key, in case you mean that.

attic uses aes-ctr mode (and stores the counter, so it does not repeat counter values under normal operations), so do you think this is a problem (please give some reference in case you do)?

azet commented 8 years ago

No but similarly I don't think it would be a problem with OCB mode :)