Closed blackbeam closed 3 years ago
Thanks for reaching out.
Please note "--quick" option doesn't produce accurate data. Removing this option will result in more accurate results.
In step 1, the scripts kicks off AEAD algorithm tests on selected hardware threads. If two hardware threads on the same core are selected performance is expected to drop vs two threads on two cores. In aggregate two threads on one core 1C2T give slightly better throughput than 1C1T. The drop level depends on what gets executed on the sibling threads.
I have captured the txt files in identical way as you did above but I selected cores to get different core/thread configurations (lscpu, lstopo or /proc/cpuinfo can help identify siblings). Some example slope (cycles per byte) and intercept (cycles) GCM128 data from my system:
$ ./ipsec_diff_tool.py -a /tmp/1thread.txt | grep GCM | grep AES-128
1 AVX512 GCM ENCRYPT GCM AES-128 0.68040 134.73462
14 AVX512 GCM DECRYPT GCM AES-128 0.72102 122.62291
$ ./ipsec_diff_tool.py -a /tmp/2thread_2c2t.txt | grep GCM | grep AES-128
1 AVX512 GCM ENCRYPT GCM AES-128 0.66055 136.31644
14 AVX512 GCM DECRYPT GCM AES-128 0.69150 115.57849
$ ./ipsec_diff_tool.py -a /tmp/2thread_1c2t.txt | grep GCM | grep AES-128
1 AVX512 GCM ENCRYPT GCM AES-128 1.33066 190.89407
14 AVX512 GCM DECRYPT GCM AES-128 0.73756 160.06201
$ ./ipsec_diff_tool.py -a /tmp/4thread_4c4t.txt | grep GCM | grep AES-128
1 AVX512 GCM ENCRYPT GCM AES-128 0.64853 137.78223
14 AVX512 GCM DECRYPT GCM AES-128 0.69476 118.28863
$ ./ipsec_diff_tool.py -a /tmp/4thread_2c4t.txt | grep GCM | grep AES-128
1 AVX512 GCM ENCRYPT GCM AES-128 1.33042 199.36442
14 AVX512 GCM DECRYPT GCM AES-128 1.20965 187.38656
$ ./ipsec_diff_tool.py -a /tmp/8thread_8c8t.txt | grep GCM | grep AES-128
1 AVX512 GCM ENCRYPT GCM AES-128 0.64832 137.52436
14 AVX512 GCM DECRYPT GCM AES-128 0.69152 115.92372
$ ./ipsec_diff_tool.py -a /tmp/8thread_4c8t.txt | grep GCM | grep AES-128
1 AVX512 GCM ENCRYPT GCM AES-128 1.32875 191.91080
14 AVX512 GCM DECRYPT GCM AES-128 1.37026 184.33711
In short, I don't observe the same symptoms on my system. For 1C2T scenarios a per hardware thread drop is expected.
What CPU model is it? Is the turbo on? I have turbo off on my system.
Hi, thanks! This clarifies things a lot.
Now, since CPUs are properly selected, I see no performance penalty for xCxT scenarios.
What CPU model is it? Is the turbo on? I have turbo off on my system.
The CPU is 3GHz Xeon Gold on Oracle Cloud's VM.Optimized3.Flex
instance with 4 CPUs. Turbo boost is off.
lscpu
shows the following picture:
$ lscpu -a --extended
CPU NODE SOCKET CORE L1d:L1i:L2:L3 ONLINE
0 0 0 0 0:0:0:0 yes
1 0 0 0 1:1:0:0 yes
2 0 0 1 2:2:1:0 yes
3 0 0 1 3:3:1:0 yes
4 0 0 2 4:4:2:0 yes
5 0 0 2 5:5:2:0 yes
6 0 0 3 6:6:3:0 yes
7 0 0 3 7:7:3:0 yes
I'm not sure why, but I couldn't reproduce the reported results.
The previous VM.Optimized3.Flex
instance was terminated so I'm trying on a new one.
Current results (only 4c8t scenario seems to be x2 affected):
# 1. Collect data
./ipsec_perf_tool.py --arch-best -t aead-only > /tmp/1thread.txt
./ipsec_perf_tool.py --arch-best -c 0,1 -t aead-only > /tmp/2thread_1c2t.txt
./ipsec_perf_tool.py --arch-best -c 0,2 -t aead-only > /tmp/2thread_2c2t.txt
./ipsec_perf_tool.py --arch-best -c 0,1,2,3 -t aead-only > /tmp/4thread_2c4t.txt
./ipsec_perf_tool.py --arch-best -c 0,2,4,6 -t aead-only > /tmp/4thread_4c4t.txt
./ipsec_perf_tool.py --arch-best -c 0,1,2,3,4,5,6,7 -t aead-only > /tmp/8thread_4c8t.txt
# 2. Analyze
./ipsec_diff_tool.py -a /tmp/1thread.txt | grep GCM | grep AES-128
1 AVX512 GCM ENCRYPT GCM AES-128 0.24401 105.44796
14 AVX512 GCM DECRYPT GCM AES-128 0.23062 95.47982
$ ./ipsec_diff_tool.py -a /tmp/2thread_2c2t.txt | grep GCM | grep AES-128
1 AVX512 GCM ENCRYPT GCM AES-128 0.23850 106.92741
14 AVX512 GCM DECRYPT GCM AES-128 0.24216 93.46703
$ ./ipsec_diff_tool.py -a /tmp/2thread_1c2t.txt | grep GCM | grep AES-128
1 AVX512 GCM ENCRYPT GCM AES-128 0.23872 106.45669
14 AVX512 GCM DECRYPT GCM AES-128 0.23199 95.51784
$ ./ipsec_diff_tool.py -a /tmp/4thread_4c4t.txt | grep GCM | grep AES-128
1 AVX512 GCM ENCRYPT GCM AES-128 0.24303 105.83637
14 AVX512 GCM DECRYPT GCM AES-128 0.23381 118.76587
$ ./ipsec_diff_tool.py -a /tmp/4thread_2c4t.txt | grep GCM | grep AES-128
1 AVX512 GCM ENCRYPT GCM AES-128 0.24006 107.16917
14 AVX512 GCM DECRYPT GCM AES-128 0.22352 114.03925
$ ./ipsec_diff_tool.py -a /tmp/8thread_4c8t.txt | grep GCM | grep AES-128
1 AVX512 GCM ENCRYPT GCM AES-128 0.43493 172.18578
14 AVX512 GCM DECRYPT GCM AES-128 0.37463 204.33661
If this seems is OK, then feel free to close the issue.
Thanks Anatoly! VM environment explains a lot here.
It seems that only in 4c8t configuration performance drop is observed. Other than that the results were pretty much the same - normally it would be expected. However virtual cpu's may be mapped in a random way to processor hardware threads. As the mapping changes so the results will change too.
Do you have any control over virtual cpu mapping onto processor hardware threads?
However virtual cpu's may be mapped in a random way to processor hardware threads. As the mapping changes so the results will change too. Do you have any control over virtual cpu mapping onto processor hardware threads?
Oracle claims that 1 OCPU is identical to a 1 physical core with hyper-treading, but I'm unable to reserve the VM resources (not sure why, though). Also I'm unable to create a bare metal instance (due to my Oracle Cloud account limits). I've requested a service limit increase and I'll share the results if It'll be approved.
It seems that only in 4c8t configuration performance drop is observed
Currently it seems that performance drop is proportional to num_of_perf_threads/total_num_of_cpus. Here are the results for 12CPU VM (6 OCPUs). You can see that there is no performance drop for 4c8t scenario (slope and intercept are jumping suspiciously). Also there is partial x2 performance drop for 5c10t and full x2 drop for 6c12t. Also c3t6 and c6t6 both seem to perform identical:
AES-128-GCM
Scenario: /tmp/gcm_c1t1.txt
NO ARCH CIPHER DIR HASH KEYSZ SLOPE A INTERCEPT A
1 AVX512 GCM ENCRYPT GCM AES-128 0.25000 -387.00000
Scenario: /tmp/gcm_c1t2.txt
NO ARCH CIPHER DIR HASH KEYSZ SLOPE A INTERCEPT A
1 AVX512 GCM ENCRYPT GCM AES-128 0.24902 -373.00000
2 AVX512 GCM ENCRYPT GCM AES-128 0.25000 -382.00000
Scenario: /tmp/gcm_c2t4.txt
NO ARCH CIPHER DIR HASH KEYSZ SLOPE A INTERCEPT A
1 AVX512 GCM ENCRYPT GCM AES-128 0.25293 -398.00000
2 AVX512 GCM ENCRYPT GCM AES-128 0.25293 -373.00000
3 AVX512 GCM ENCRYPT GCM AES-128 0.25195 -380.00000
4 AVX512 GCM ENCRYPT GCM AES-128 0.25391 -359.00000
Scenario: /tmp/gcm_c3t6.txt
NO ARCH CIPHER DIR HASH KEYSZ SLOPE A INTERCEPT A
1 AVX512 GCM ENCRYPT GCM AES-128 0.25391 -395.00000
2 AVX512 GCM ENCRYPT GCM AES-128 0.25098 -365.00000
3 AVX512 GCM ENCRYPT GCM AES-128 0.25586 -348.00000
4 AVX512 GCM ENCRYPT GCM AES-128 0.25000 -346.00000
5 AVX512 GCM ENCRYPT GCM AES-128 0.25293 -397.00000
6 AVX512 GCM ENCRYPT GCM AES-128 0.25195 -370.00000
Scenario: /tmp/gcm_c4t8.txt
NO ARCH CIPHER DIR HASH KEYSZ SLOPE A INTERCEPT A
1 AVX512 GCM ENCRYPT GCM AES-128 0.07324 4495.00000
2 AVX512 GCM ENCRYPT GCM AES-128 0.25195 -380.00000
3 AVX512 GCM ENCRYPT GCM AES-128 0.25488 -410.00000
4 AVX512 GCM ENCRYPT GCM AES-128 0.28906 -596.00000
5 AVX512 GCM ENCRYPT GCM AES-128 0.09473 3949.00000
6 AVX512 GCM ENCRYPT GCM AES-128 0.02637 5077.00000
7 AVX512 GCM ENCRYPT GCM AES-128 0.25098 -361.00000
8 AVX512 GCM ENCRYPT GCM AES-128 0.07617 4267.00000
Scenario: /tmp/gcm_c5t10.txt
NO ARCH CIPHER DIR HASH KEYSZ SLOPE A INTERCEPT A
1 AVX512 GCM ENCRYPT GCM AES-128 0.46777 -424.00000
2 AVX512 GCM ENCRYPT GCM AES-128 -0.01855 5808.00000
3 AVX512 GCM ENCRYPT GCM AES-128 0.25391 -345.00000
4 AVX512 GCM ENCRYPT GCM AES-128 0.25391 -392.00000
5 AVX512 GCM ENCRYPT GCM AES-128 0.10059 4001.00000
6 AVX512 GCM ENCRYPT GCM AES-128 0.46875 -438.00000
7 AVX512 GCM ENCRYPT GCM AES-128 0.46680 -409.00000
8 AVX512 GCM ENCRYPT GCM AES-128 0.07520 4420.00000
9 AVX512 GCM ENCRYPT GCM AES-128 -0.01855 5802.00000
10 AVX512 GCM ENCRYPT GCM AES-128 0.46484 -370.00000
Scenario: /tmp/gcm_c6t12.txt
NO ARCH CIPHER DIR HASH KEYSZ SLOPE A INTERCEPT A
1 AVX512 GCM ENCRYPT GCM AES-128 0.46777 -427.00000
2 AVX512 GCM ENCRYPT GCM AES-128 0.46777 -427.00000
3 AVX512 GCM ENCRYPT GCM AES-128 0.46777 -418.00000
4 AVX512 GCM ENCRYPT GCM AES-128 0.46387 -451.00000
5 AVX512 GCM ENCRYPT GCM AES-128 0.46680 -410.00000
6 AVX512 GCM ENCRYPT GCM AES-128 0.46777 -422.00000
7 AVX512 GCM ENCRYPT GCM AES-128 0.46777 -430.00000
8 AVX512 GCM ENCRYPT GCM AES-128 0.46777 -413.00000
9 AVX512 GCM ENCRYPT GCM AES-128 0.46582 -394.00000
10 AVX512 GCM ENCRYPT GCM AES-128 0.46777 -425.00000
11 AVX512 GCM ENCRYPT GCM AES-128 0.46680 -413.00000
12 AVX512 GCM ENCRYPT GCM AES-128 0.47852 -443.00000
Scenario: /tmp/gcm_c6t6.txt
NO ARCH CIPHER DIR HASH KEYSZ SLOPE A INTERCEPT A
1 AVX512 GCM ENCRYPT GCM AES-128 0.25195 -363.00000
2 AVX512 GCM ENCRYPT GCM AES-128 0.25195 -379.00000
3 AVX512 GCM ENCRYPT GCM AES-128 0.25195 -381.00000
4 AVX512 GCM ENCRYPT GCM AES-128 0.25195 -370.00000
5 AVX512 GCM ENCRYPT GCM AES-128 0.27441 -305.00000
6 AVX512 GCM ENCRYPT GCM AES-128 0.25195 -371.00000
Btw, what does it mean for slope and intercept to be negative?
Negative numbers typically indicate that collected performance samples are not linear. Running tests for longer may eliminate this issue
Let me close this issue. Feel free to re-open if there is any update.
Hi.
I see a slowdown when measuring throughput using the
ipsec_perf_tool
and I'm wondering whether this result is expected or I'm doing something wrong?This is for 2048 buffer on 4 core 3GHz Xeon Gold (8 cores reported by
/proc/cpuinfo
).My commands: