veracrypt / VeraCrypt

Disk encryption with strong security based on TrueCrypt
https://www.veracrypt.fr
Other
6.71k stars 936 forks source link

Follow-up on Ryzen 2/3 Performance Issue #804

Open gashtal opened 3 years ago

gashtal commented 3 years ago

As a follow-up to the issue I previously reported here which was closed due to inactivity, I finally got a brand new AMD Ryzen 9 5950X and would like to continue the discussion to ensure there are no performance deficiencies/issues when using Veracrypt on Ryzen 2/3 CPU. Here is the output of the benchmark @idrassi posted on the previous issue on my CPU:

64-bit AES-NI Benchmark by Mounir IDRASSI (mounir@idrix.fr)
Version 2020-12-13

CPU has AES-NI extension: YES

AES-NI 4-way: ok (Enc = 8903.52 MB/s, Dec = 9057.24 MB/s)
AES-NI 7-way: ok (Enc = 8774.26 MB/s, Dec = 8734.13 MB/s)
AES-NI 15-way: ok (Enc = 9743.67 MB/s, Dec = 9796.19 MB/s)

The performance I get from the built-in benchmark of Veracrypt 1.24-Update7 for AES with a 1 GB buffer is 15.1 GiB/s Encryption and 13.5 GiB/s Decryption. Let me know if there are any other tests I can run to shed more like on the potential reason behind the issue.

P.S. It seems the numbers I am getting are much higher than the numbers in Techpowerup's review here for the same CPU, but are still lower than Intel CPUs with less cores.

idrassi commented 3 years ago

Thank you @gashtal for this feedback. The benchmark utility I developed gives the AES performance of a single code whereas VeraCrypt uses parallelization to get maximum speed. So the benchmark utility measures the raw performance of AES-NI extension without any other forms for CPU speedups.

In order to get a comparison reference, I run the benchmark utility on a Intel Core i7-11370H of a laptop. Here is its output:

64-bit AES-NI Benchmark by Mounir IDRASSI (mounir@idrix.fr)
Version 2020-12-13

CPU has AES-NI extension: YES

AES-NI 4-way: ok (Enc = 8072.24 MB/s, Dec = 7985.51 MB/s)
AES-NI 7-way: ok (Enc = 8329.14 MB/s, Dec = 8273.03 MB/s)
AES-NI 15-way: ok (Enc = 8208.34 MB/s, Dec = 8157.29 MB/s)

Judging by cpubenchmark numbers, the i7-11370H is almost 4x slower than the Ryzen 9 5950X but its AES-NI raw performance is only 1.2 slow than the Ryzen 9.

Also VeraCrypt builtin benchmark for AES with a 1GiB buffer gives 10.3 GiB/smean performance for the Intel i7-11370H: This is only 1.4 slower than the Ryzen 9.

This leads me to think that Intel CPUs in the same category as the Ryzen 9 5950X (e.g. Intel Core i9-9980XE) will have a much higher raw AES-NI performance. I don't have such Desktop CPU to do benchmark myself but if someone can do it then it will help confirm this.

At this stage, I would say that the performance issue with AMD Ryzen CPUs is linked to the poor performance of AMD's AES-NI hardware implementation. Intel has probably better hardware design for AES-NI than AMD.

NumDeP commented 2 years ago

I honestly don't see what the issue is. It's an advanced processor with great encryption and decryption speeds. I would also agree with the last part of the last comment as well; I pretty sure Intel initially came out with the hardware AES-NI acceleration feature, it stands to reason that it would work better in their processors that being said it has been been available for quite some time now.

Just a side note but still relevant - Apple doesn't overly disclose every aspect of their products other than the fancy marketing side of things so I don't know if they even implemented the AES-NI feature in their processors but seeing as users have been asking for Apple's M1 support and there is support of it in a couple of the latest builds, it would be good to see results from those alongside the Intel CPU as mentioned in the same category here as well.

doug65536 commented 1 year ago

The "poor" performance of the Zen2 AES-NI implementation is enough to utterly overwhelm the memory controller. Don't waste your time measuring what silly unrealistic repetition loops do with 0% L2 miss.

What is this hypothetical data source that is encrypted? Did it just come from I/O? It's going miss like crazy. You have giant blocks of already encrypted data in ram that you repeatedly redundantly decrypt? Or you do you just miss once, and decrypt it and be done with it? It's all misses on the loads in AES code, isn't it? I mean realistically, not synthetic benchmarking. My argument is weaker for encryption, though. You'll probably hit on the loads, but you are very likely to write allocate the destination, so it all almost comes back to being the same again, but stores are more asynchronous.

It's ingenious to design the microarchitecture so instructions that are memory bound don't have any more resources than needed to keep up with the memory controller.