Closed aahajj closed 5 months ago
Thanks for sharing your findings! With a quick and dirty run on my machine (*) using botan's integrated benchmarking tool, I see the opposite. Basically running the following CLI command(s):
./botan speed --buf-size=4096000 AES-128
./botan speed --buf-size=4096000 AES-128/CTR-BE
./botan speed --buf-size=4096000 AES-128/CBC
./botan speed --buf-size=4096000 AES-128/CFB
Algorithm | Throughput |
---|---|
AES-128 | 11398.724 MiB/sec |
AES-128/CTR-BE | 6014.384 MiB/sec |
AES-128/CBC | 1372.127 MiB/sec |
AES-128/CFB | 715.270 MiB/sec |
This tool doesn't use the FFI. One (technical) difference: CTR-mode is a StreamCipher
while all the modes of operation are subclasses of Cipher_Mode
. Perhaps this is reasonable starting point for further investigation.
(*) "my machine"
11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz 2.50 GHz Ubuntu 22.04 (running in WSL)
Mhh, current working hypothesis: CTR-mode being a stream cipher, it features an update_granularity()
of 1 byte (!). Other cipher modes typically use the underlying block size of the bare symmetric algorithm. Also, CTR-mode doesn't provide any authentication tags or paddings, hence, its minimum_final_size()
is 0 bytes.
The FFI adapter layer uses this information to determine the update granularity it will use internally:
... and then slices the input data accordingly:
Therefore, we might actually pass the payload data byte-by-byte to the encryption/decryption routine. Obviously, that's awfully slow. 😨
Thanks again, for reporting this!
Ah, I stumbled upon that function today too, and I was a bit puzzled by the update granularity being just 1 byte. But now that you mention it, it actually makes perfect sense! Thank you for the clarification!
Oh, and by the way, AES-128/CCM also has an update granularity of 1 Byte. CCM, among the three AEAD algorithms I've tested, seems to be the slowest, and I think it might be due to the same issue.
I couldn't help but notice a naming inconsistency between the function botan_cipher_get_ideal_granularity() in the website's documentation and its declaration in https://github.com/randombit/botan/blob/b84acd926d1ccc78524c69aee5d53279b9a5e361/src/lib/ffi/ffi.h#L561
For the record: This pull request addresses the performance issues described in this ticket. While working on the performance fix, another (functional) issue was found in the same code location (https://github.com/randombit/botan/pull/3971). As a result, the performance fix didn't make it to the just-released 3.4.0 release. The mentioned bugfix landed and got released, however.
@aahajj if you have time, please repeat your measurements after applying the patch suggested in #3951. It should speed up the stream-cipher-esque modes quite significantly.
Sorry for the late response. I've changed jobs and am officially no longer working on that project. I was about to redo the measurements again but found that the patch is not merged yet, pending review from @randombit. By the way, I obtained approval to publish the tool on my GitHub page if you want to take a look at it: CBOS. Once the pull request is merged, I will happily do the measurements again.
I did a quick-and-dirty run with CBOS, first against the latest Botan master and then against #3951.
The effect is especially visible for CTR and CCM modes which go from 0.0x bytes/cycle to almost 2 bytes/cycle. Also other modes seem to benefit from a rough 2x speedup. LGTM!
Thanks again for reporting.
Algorithm | Bytes/Cycle |
---|---|
AES-128/CTR | 0.045291 |
AES-128/CBC | 0.202803 |
AES-128/CFB | 0.318757 |
AES-128/GCM | 0.344758 |
AES-128/OCB | 0.301191 |
AES-128/CCM | 0.095022 |
Algorithm | Bytes/Cycle |
---|---|
AES-128/CTR | 1.825574 |
AES-128/CBC | 0.682218 |
AES-128/CFB | 0.545389 |
AES-128/GCM | 0.863399 |
AES-128/OCB | 0.888618 |
AES-128/CCM | 1.436926 |
looks much better now :)
Hello everyone,
I have created a benchmarking tool to compare performance between libraries and have noticed a significant drop in performance with Botan's AES CTR. In my benchmark, I generated plaintexts with a length of 4096 Bytes for each round and measured the time it takes for 1 million iterations. For my performance test I have used the Botan C interface ffi.h.
Results:
AES-NI and SSSE3 enabled:
AES-NI disabled:
With AES-NI and SSSE3 disabled:
Environment:
I suspect that the AES CTR encryption process is not being parallelized, which could be causing the significant drop in performance. However, I'm uncertain about the exact reason for this issue and would appreciate your assistance in investigating further.
Thank you and best regards, Ahmed
Edit 1: add Byte Order Edit 2: fix a comma typo