jedisct1 / aegis-X

The AEGIS-128X and AEGIS-256X high performance ciphers.
34 stars 0 forks source link

Downclocking? #2

Closed victorstewart closed 4 months ago

victorstewart commented 1 year ago

i figure this mental walk could be useful for anyone else who needs to make the same decision

i'm going to use aegis-128 for intra-datacenter container to container network communication encryption, so essentially every core on every machine would be running these encryptions and decryptions on and off but incredibly frequently. working through a mental model whether i should be using aegis-128l, aegis-128x2, or aegis-128x4.

per this blog post https://lemire.me/blog/2018/09/07/avx-512-when-and-how-to-use-these-new-instructions/..

since none of the aegis-128x operations involve multiplication or floating points (aka heavy), and...

The bar for light AVX-512 is lower. Even if the work is spread on all cores, you may only get a 15% frequency on some chips like a Xeon Gold. So you only have to check that AVX-512 gives you a greater than 15% gain for your overall application on a per-cycle basis.

RE the benchmark data, it seems like it never makes sense to use aegis-128x4? i guess this could vary if there's less downclocking or quicker upclocking on newer chips.

then for aegis-128x2, my takeaway from the article was that light AVX2 operations will not downclock the core, thus always use aegis-128x2 over aegis-128l. so i'll probably go in that direction then measure later to confirm.

P.S.

extremely funny the hallucinations these things have LOL. CPUID_EAX_AVX512F_DOWNCLOCK should exist though.

IMG_5110

P.P.S

@travisdowns any thoughts on this after seeing the benchmark data on the README?

victorstewart commented 1 year ago

but then this https://travisdowns.github.io/blog/2020/08/19/icl-avx512-freq.html seems to throw all that out the window as of Golden Cove cores. but i guess given the aegis-128x4 performance increase is SO minor over aegis-128x2, that the (surely?) increased power cost of aegis-128x4 still makes aegis-128x2 always the winning choice?

jedisct1 commented 1 year ago

X2 should never downclock, and you're guaranteed to double the throughput.

I think recent CPUs can heavily use all AVX512 instructions without down clocking. I was only able to test on a Zen 4, but indeed didn't see any performance drop even after an extended amount of time.

Because it's so fast, X4 tends to hit the CPU cache and memory quite hard. In a shared environment or simply if the CPU is doing other memory intensive operations at the same time, you're actually likely to see an overall performance decrease.

So, yes, X2 would be a safer bet, especially with AEGIS-128L.

The next revision of the AEGIS document is going to include AEGIS-X, but I think we will only register the TLS and AEAD identifiers for X2.

jedisct1 commented 1 year ago

aegis-bench.tar.gz

Here's a simple benchmark in C you can run with a command like:

./aegis128x4 1000000 10000

And to see if downclocking happens, increase the second number, and check that the throughput doesn't decrease.

jedisct1 commented 1 year ago

By the way, the compiler makes a difference.

$ env CC="gcc -march=native" make

$ ./aegis128x2 1000000 100000
average throughput: 50550 msg/s
average throughput: 404400 Mb/s

$./aegis128x4 1000000 100000
average throughput: 62847 msg/s
average throughput: 502776 Mb/s
$ env CC="clang -march=native" make

$ ./aegis128x2 1000000 100000
average throughput: 54022 msg/s
average throughput: 432176 Mb/s

$ ./aegis128x4 1000000 100000
average throughput: 68177 msg/s
average throughput: 545416 Mb/s
$ env CC="zig cc" make

$ ./aegis128x2 1000000 100000
average throughput: 54680 msg/s
average throughput: 437440 Mb/s

$ ./aegis128x4 1000000 100000
average throughput: 72776 msg/s
average throughput: 582208 Mb/s
jedisct1 commented 1 year ago

This is with gcc 12.2.0 and clang 15.0.7 from Ubuntu Lunar, so maybe figures are different with the latest version of these compilers.