intel / intel-ipsec-mb

Intel(R) Multi-Buffer Crypto for IPSec
BSD 3-Clause "New" or "Revised" License
288 stars 87 forks source link

ipsec_perf_tool: Per-thread throughput for multiple threads #85

Closed blackbeam closed 3 years ago

blackbeam commented 3 years ago

Hi.

I see a slowdown when measuring throughput using the ipsec_perf_tool and I'm wondering whether this result is expected or I'm doing something wrong?

ARCH     CIPHER   DIR       HASH   KEYSZ     1-THREAD   4-THREADS   8-THREADS
AVX512   GCM      ENCRYPT   GCM    AES-128   80843.46   27105.54    16149.46

This is for 2048 buffer on 4 core 3GHz Xeon Gold (8 cores reported by /proc/cpuinfo).

My commands:

# 1. Collect data
./ipsec_perf_tool.py -t aead-only --arch-best --quick > /tmp/1thread.txt
./ipsec_perf_tool.py -t aead-only --arch-best --quick -c 0-3 > /tmp/4thread.txt
./ipsec_perf_tool.py -t aead-only --arch-best --quick -c 0-7 > /tmp/8thread.txt
# 2. Measure throughput
./ipsec_diff_tool.py -a -t 2048 3000 /tmp/1thread.txt
./ipsec_diff_tool.py -a -t 2048 3000 /tmp/4thread.txt
./ipsec_diff_tool.py -a -t 2048 3000 /tmp/8thread.txt
tkanteck commented 3 years ago

Thanks for reaching out.

Please note "--quick" option doesn't produce accurate data. Removing this option will result in more accurate results.

In step 1, the scripts kicks off AEAD algorithm tests on selected hardware threads. If two hardware threads on the same core are selected performance is expected to drop vs two threads on two cores. In aggregate two threads on one core 1C2T give slightly better throughput than 1C1T. The drop level depends on what gets executed on the sibling threads.

I have captured the txt files in identical way as you did above but I selected cores to get different core/thread configurations (lscpu, lstopo or /proc/cpuinfo can help identify siblings). Some example slope (cycles per byte) and intercept (cycles) GCM128 data from my system:

What CPU model is it? Is the turbo on? I have turbo off on my system.

blackbeam commented 3 years ago

Hi, thanks! This clarifies things a lot.

Now, since CPUs are properly selected, I see no performance penalty for xCxT scenarios.

What CPU model is it? Is the turbo on? I have turbo off on my system.

The CPU is 3GHz Xeon Gold on Oracle Cloud's VM.Optimized3.Flex instance with 4 CPUs. Turbo boost is off. lscpu shows the following picture:

$ lscpu -a --extended
CPU NODE SOCKET CORE L1d:L1i:L2:L3 ONLINE
  0    0      0    0 0:0:0:0          yes
  1    0      0    0 1:1:0:0          yes
  2    0      0    1 2:2:1:0          yes
  3    0      0    1 3:3:1:0          yes
  4    0      0    2 4:4:2:0          yes
  5    0      0    2 5:5:2:0          yes
  6    0      0    3 6:6:3:0          yes
  7    0      0    3 7:7:3:0          yes

I'm not sure why, but I couldn't reproduce the reported results. The previous VM.Optimized3.Flex instance was terminated so I'm trying on a new one. Current results (only 4c8t scenario seems to be x2 affected):

# 1. Collect data
./ipsec_perf_tool.py --arch-best -t aead-only > /tmp/1thread.txt
./ipsec_perf_tool.py --arch-best -c 0,1 -t aead-only > /tmp/2thread_1c2t.txt
./ipsec_perf_tool.py --arch-best -c 0,2 -t aead-only > /tmp/2thread_2c2t.txt
./ipsec_perf_tool.py --arch-best -c 0,1,2,3 -t aead-only > /tmp/4thread_2c4t.txt
./ipsec_perf_tool.py --arch-best -c 0,2,4,6 -t aead-only > /tmp/4thread_4c4t.txt
./ipsec_perf_tool.py --arch-best -c 0,1,2,3,4,5,6,7 -t aead-only > /tmp/8thread_4c8t.txt

# 2. Analyze
./ipsec_diff_tool.py -a /tmp/1thread.txt | grep GCM | grep AES-128
1             AVX512        GCM           ENCRYPT       GCM           AES-128       0.24401       105.44796
14            AVX512        GCM           DECRYPT       GCM           AES-128       0.23062       95.47982
$ ./ipsec_diff_tool.py -a /tmp/2thread_2c2t.txt | grep GCM | grep AES-128
1             AVX512        GCM           ENCRYPT       GCM           AES-128       0.23850       106.92741
14            AVX512        GCM           DECRYPT       GCM           AES-128       0.24216       93.46703
$ ./ipsec_diff_tool.py -a /tmp/2thread_1c2t.txt | grep GCM | grep AES-128
1             AVX512        GCM           ENCRYPT       GCM           AES-128       0.23872       106.45669
14            AVX512        GCM           DECRYPT       GCM           AES-128       0.23199       95.51784
$ ./ipsec_diff_tool.py -a /tmp/4thread_4c4t.txt | grep GCM | grep AES-128
1             AVX512        GCM           ENCRYPT       GCM           AES-128       0.24303       105.83637
14            AVX512        GCM           DECRYPT       GCM           AES-128       0.23381       118.76587
$ ./ipsec_diff_tool.py -a /tmp/4thread_2c4t.txt | grep GCM | grep AES-128
1             AVX512        GCM           ENCRYPT       GCM           AES-128       0.24006       107.16917
14            AVX512        GCM           DECRYPT       GCM           AES-128       0.22352       114.03925
$ ./ipsec_diff_tool.py -a /tmp/8thread_4c8t.txt | grep GCM | grep AES-128
1             AVX512        GCM           ENCRYPT       GCM           AES-128       0.43493       172.18578
14            AVX512        GCM           DECRYPT       GCM           AES-128       0.37463       204.33661

If this seems is OK, then feel free to close the issue.

tkanteck commented 3 years ago

Thanks Anatoly! VM environment explains a lot here.

It seems that only in 4c8t configuration performance drop is observed. Other than that the results were pretty much the same - normally it would be expected. However virtual cpu's may be mapped in a random way to processor hardware threads. As the mapping changes so the results will change too.

Do you have any control over virtual cpu mapping onto processor hardware threads?

blackbeam commented 3 years ago

However virtual cpu's may be mapped in a random way to processor hardware threads. As the mapping changes so the results will change too. Do you have any control over virtual cpu mapping onto processor hardware threads?

Oracle claims that 1 OCPU is identical to a 1 physical core with hyper-treading, but I'm unable to reserve the VM resources (not sure why, though). Also I'm unable to create a bare metal instance (due to my Oracle Cloud account limits). I've requested a service limit increase and I'll share the results if It'll be approved.

It seems that only in 4c8t configuration performance drop is observed

Currently it seems that performance drop is proportional to num_of_perf_threads/total_num_of_cpus. Here are the results for 12CPU VM (6 OCPUs). You can see that there is no performance drop for 4c8t scenario (slope and intercept are jumping suspiciously). Also there is partial x2 performance drop for 5c10t and full x2 drop for 6c12t. Also c3t6 and c6t6 both seem to perform identical:

AES-128-GCM

Scenario: /tmp/gcm_c1t1.txt
NO            ARCH          CIPHER        DIR           HASH          KEYSZ         SLOPE A       INTERCEPT A
1             AVX512        GCM           ENCRYPT       GCM           AES-128       0.25000       -387.00000

Scenario: /tmp/gcm_c1t2.txt
NO            ARCH          CIPHER        DIR           HASH          KEYSZ         SLOPE A       INTERCEPT A
1             AVX512        GCM           ENCRYPT       GCM           AES-128       0.24902       -373.00000
2             AVX512        GCM           ENCRYPT       GCM           AES-128       0.25000       -382.00000

Scenario: /tmp/gcm_c2t4.txt
NO            ARCH          CIPHER        DIR           HASH          KEYSZ         SLOPE A       INTERCEPT A
1             AVX512        GCM           ENCRYPT       GCM           AES-128       0.25293       -398.00000
2             AVX512        GCM           ENCRYPT       GCM           AES-128       0.25293       -373.00000
3             AVX512        GCM           ENCRYPT       GCM           AES-128       0.25195       -380.00000
4             AVX512        GCM           ENCRYPT       GCM           AES-128       0.25391       -359.00000

Scenario: /tmp/gcm_c3t6.txt
NO            ARCH          CIPHER        DIR           HASH          KEYSZ         SLOPE A       INTERCEPT A
1             AVX512        GCM           ENCRYPT       GCM           AES-128       0.25391       -395.00000
2             AVX512        GCM           ENCRYPT       GCM           AES-128       0.25098       -365.00000
3             AVX512        GCM           ENCRYPT       GCM           AES-128       0.25586       -348.00000
4             AVX512        GCM           ENCRYPT       GCM           AES-128       0.25000       -346.00000
5             AVX512        GCM           ENCRYPT       GCM           AES-128       0.25293       -397.00000
6             AVX512        GCM           ENCRYPT       GCM           AES-128       0.25195       -370.00000

Scenario: /tmp/gcm_c4t8.txt
NO            ARCH          CIPHER        DIR           HASH          KEYSZ         SLOPE A       INTERCEPT A
1             AVX512        GCM           ENCRYPT       GCM           AES-128       0.07324       4495.00000
2             AVX512        GCM           ENCRYPT       GCM           AES-128       0.25195       -380.00000
3             AVX512        GCM           ENCRYPT       GCM           AES-128       0.25488       -410.00000
4             AVX512        GCM           ENCRYPT       GCM           AES-128       0.28906       -596.00000
5             AVX512        GCM           ENCRYPT       GCM           AES-128       0.09473       3949.00000
6             AVX512        GCM           ENCRYPT       GCM           AES-128       0.02637       5077.00000
7             AVX512        GCM           ENCRYPT       GCM           AES-128       0.25098       -361.00000
8             AVX512        GCM           ENCRYPT       GCM           AES-128       0.07617       4267.00000

Scenario: /tmp/gcm_c5t10.txt
NO            ARCH          CIPHER        DIR           HASH          KEYSZ         SLOPE A       INTERCEPT A
1             AVX512        GCM           ENCRYPT       GCM           AES-128       0.46777       -424.00000
2             AVX512        GCM           ENCRYPT       GCM           AES-128       -0.01855      5808.00000
3             AVX512        GCM           ENCRYPT       GCM           AES-128       0.25391       -345.00000
4             AVX512        GCM           ENCRYPT       GCM           AES-128       0.25391       -392.00000
5             AVX512        GCM           ENCRYPT       GCM           AES-128       0.10059       4001.00000
6             AVX512        GCM           ENCRYPT       GCM           AES-128       0.46875       -438.00000
7             AVX512        GCM           ENCRYPT       GCM           AES-128       0.46680       -409.00000
8             AVX512        GCM           ENCRYPT       GCM           AES-128       0.07520       4420.00000
9             AVX512        GCM           ENCRYPT       GCM           AES-128       -0.01855      5802.00000
10            AVX512        GCM           ENCRYPT       GCM           AES-128       0.46484       -370.00000

Scenario: /tmp/gcm_c6t12.txt
NO            ARCH          CIPHER        DIR           HASH          KEYSZ         SLOPE A       INTERCEPT A
1             AVX512        GCM           ENCRYPT       GCM           AES-128       0.46777       -427.00000
2             AVX512        GCM           ENCRYPT       GCM           AES-128       0.46777       -427.00000
3             AVX512        GCM           ENCRYPT       GCM           AES-128       0.46777       -418.00000
4             AVX512        GCM           ENCRYPT       GCM           AES-128       0.46387       -451.00000
5             AVX512        GCM           ENCRYPT       GCM           AES-128       0.46680       -410.00000
6             AVX512        GCM           ENCRYPT       GCM           AES-128       0.46777       -422.00000
7             AVX512        GCM           ENCRYPT       GCM           AES-128       0.46777       -430.00000
8             AVX512        GCM           ENCRYPT       GCM           AES-128       0.46777       -413.00000
9             AVX512        GCM           ENCRYPT       GCM           AES-128       0.46582       -394.00000
10            AVX512        GCM           ENCRYPT       GCM           AES-128       0.46777       -425.00000
11            AVX512        GCM           ENCRYPT       GCM           AES-128       0.46680       -413.00000
12            AVX512        GCM           ENCRYPT       GCM           AES-128       0.47852       -443.00000

Scenario: /tmp/gcm_c6t6.txt
NO            ARCH          CIPHER        DIR           HASH          KEYSZ         SLOPE A       INTERCEPT A
1             AVX512        GCM           ENCRYPT       GCM           AES-128       0.25195       -363.00000
2             AVX512        GCM           ENCRYPT       GCM           AES-128       0.25195       -379.00000
3             AVX512        GCM           ENCRYPT       GCM           AES-128       0.25195       -381.00000
4             AVX512        GCM           ENCRYPT       GCM           AES-128       0.25195       -370.00000
5             AVX512        GCM           ENCRYPT       GCM           AES-128       0.27441       -305.00000
6             AVX512        GCM           ENCRYPT       GCM           AES-128       0.25195       -371.00000

Btw, what does it mean for slope and intercept to be negative?

tkanteck commented 3 years ago

Negative numbers typically indicate that collected performance samples are not linear. Running tests for longer may eliminate this issue

tkanteck commented 3 years ago

Let me close this issue. Feel free to re-open if there is any update.