randombit / botan

Cryptography Toolkit
https://botan.randombit.net
BSD 2-Clause "Simplified" License
2.59k stars 567 forks source link

AES-CTR (and probably more stream ciphers) are slow when used via the FFI's Stream_Cipher mode. #3925

Closed aahajj closed 5 months ago

aahajj commented 8 months ago

Hello everyone,

I have created a benchmarking tool to compare performance between libraries and have noticed a significant drop in performance with Botan's AES CTR. In my benchmark, I generated plaintexts with a length of 4096 Bytes for each round and measured the time it takes for 1 million iterations. For my performance test I have used the Botan C interface ffi.h.

Results:

AES-NI and SSSE3 enabled:

Library Algorithm Time (seconds)
Botan 3.2.0 AES-128 0.554971
Botan 3.2.0 AES-128/CTR 77.982720
Botan 3.2.0 AES-128/CBC 5.963908
Botan 3.2.0 AES-128/CFB 7.617746

AES-NI disabled:

Library Algorithm Time (seconds)
Botan 3.2.0 AES-128 9.187737
Botan 3.2.0 AES-128/CTR 81.234424
Botan 3.2.0 AES-128/CBC 19.137964
Botan 3.2.0 AES-128/CFB 20.943194

With AES-NI and SSSE3 disabled:

Library Algorithm Time (seconds)
Botan 3.2.0 AES-128 66.311669
Botan 3.2.0 AES-128/CTR 135.916029
Botan 3.2.0 AES-128/CBC 153.400266
Botan 3.2.0 AES-128/CFB 151.787276

Environment:

Thank you and best regards, Ahmed

Edit 1: add Byte Order Edit 2: fix a comma typo

reneme commented 8 months ago

Thanks for sharing your findings! With a quick and dirty run on my machine (*) using botan's integrated benchmarking tool, I see the opposite. Basically running the following CLI command(s):

./botan speed --buf-size=4096000 AES-128
./botan speed --buf-size=4096000 AES-128/CTR-BE
./botan speed --buf-size=4096000 AES-128/CBC
./botan speed --buf-size=4096000 AES-128/CFB
Algorithm Throughput
AES-128 11398.724 MiB/sec
AES-128/CTR-BE 6014.384 MiB/sec
AES-128/CBC 1372.127 MiB/sec
AES-128/CFB 715.270 MiB/sec

This tool doesn't use the FFI. One (technical) difference: CTR-mode is a StreamCipher while all the modes of operation are subclasses of Cipher_Mode. Perhaps this is reasonable starting point for further investigation.

(*) "my machine"

11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz 2.50 GHz Ubuntu 22.04 (running in WSL)

reneme commented 8 months ago

Mhh, current working hypothesis: CTR-mode being a stream cipher, it features an update_granularity() of 1 byte (!). Other cipher modes typically use the underlying block size of the bare symmetric algorithm. Also, CTR-mode doesn't provide any authentication tags or paddings, hence, its minimum_final_size() is 0 bytes.

The FFI adapter layer uses this information to determine the update granularity it will use internally:

https://github.com/randombit/botan/blob/b84acd926d1ccc78524c69aee5d53279b9a5e361/src/lib/ffi/ffi_cipher.cpp#L48-L51

... and then slices the input data accordingly:

https://github.com/randombit/botan/blob/b84acd926d1ccc78524c69aee5d53279b9a5e361/src/lib/ffi/ffi_cipher.cpp#L187-L205

Therefore, we might actually pass the payload data byte-by-byte to the encryption/decryption routine. Obviously, that's awfully slow. 😨

Thanks again, for reporting this!

aahajj commented 8 months ago

Ah, I stumbled upon that function today too, and I was a bit puzzled by the update granularity being just 1 byte. But now that you mention it, it actually makes perfect sense! Thank you for the clarification!

Oh, and by the way, AES-128/CCM also has an update granularity of 1 Byte. CCM, among the three AEAD algorithms I've tested, seems to be the slowest, and I think it might be due to the same issue.

aahajj commented 8 months ago

I couldn't help but notice a naming inconsistency between the function botan_cipher_get_ideal_granularity() in the website's documentation and its declaration in https://github.com/randombit/botan/blob/b84acd926d1ccc78524c69aee5d53279b9a5e361/src/lib/ffi/ffi.h#L561

reneme commented 7 months ago

For the record: This pull request addresses the performance issues described in this ticket. While working on the performance fix, another (functional) issue was found in the same code location (https://github.com/randombit/botan/pull/3971). As a result, the performance fix didn't make it to the just-released 3.4.0 release. The mentioned bugfix landed and got released, however.

@aahajj if you have time, please repeat your measurements after applying the patch suggested in #3951. It should speed up the stream-cipher-esque modes quite significantly.

aahajj commented 6 months ago

Sorry for the late response. I've changed jobs and am officially no longer working on that project. I was about to redo the measurements again but found that the patch is not merged yet, pending review from @randombit. By the way, I obtained approval to publish the tool on my GitHub page if you want to take a look at it: CBOS. Once the pull request is merged, I will happily do the measurements again.

reneme commented 5 months ago

I did a quick-and-dirty run with CBOS, first against the latest Botan master and then against #3951.

The effect is especially visible for CTR and CCM modes which go from 0.0x bytes/cycle to almost 2 bytes/cycle. Also other modes seem to benefit from a rough 2x speedup. LGTM!

Thanks again for reporting.

master

Algorithm Bytes/Cycle
AES-128/CTR 0.045291
AES-128/CBC 0.202803
AES-128/CFB 0.318757
AES-128/GCM 0.344758
AES-128/OCB 0.301191
AES-128/CCM 0.095022

fix/performance_of_ffi_stream_ciphers

Algorithm Bytes/Cycle
AES-128/CTR 1.825574
AES-128/CBC 0.682218
AES-128/CFB 0.545389
AES-128/GCM 0.863399
AES-128/OCB 0.888618
AES-128/CCM 1.436926
aahajj commented 5 months ago

looks much better now :)