intel / QAT_Engine

Intel QuickAssist Technology( QAT) OpenSSL Engine (an OpenSSL Plug-In Engine) which provides cryptographic acceleration for both hardware and optimized software using Intel QuickAssist Technology enabled Intel platforms. https://developer.intel.com/quickassist
BSD 3-Clause "New" or "Revised" License
400 stars 127 forks source link

Why aes-128-cbc-hmac-sha256 algorithm with a packet length of 4K perform better than that of 8K or even 16K? #329

Open Wang-Robot opened 1 month ago

Wang-Robot commented 1 month ago

When using QAT acceleration, why does the performance of the aes-128-cbc-hmac-sha256 algorithm with a packet length of 4K perform better than that of 8K or even 16K?

  1. Below is my configuration file and compile commands:
    • openssl version OpenSSL 3.2.1
    • compile commands

cd QAT20.L.1.1.40-0018 ./configure make make install

cd QAT_Engine-1.6.0 ./configure --with-qat_hw_dir=/home/QAT20.L.1.1.40-0018/ --enable-qat_hw_gcm make make install

  • configuration file
[root@emer` QAT_Engine-1.6.0]# cat /etc/4xxx_dev0.conf 
...
...
...
...
[GENERAL]
ServicesEnabled = sym

ConfigVersion = 2
...
...
...
[SHIM]
NumberCyInstances = 1
NumberDcInstances = 0
NumProcesses = 32
LimitDevAccess = 1
...
...

[root@emer QAT_Engine-1.6.0]# taskset -c 1-4 openssl speed -elapsed -evp aes-128-cbc-hmac-sha256 -async_jobs 48 -multi 4 --engine qatengine -bytes 4096 ... ... ... ... Got: +H:4096 from 0 AES-128-CBC-HMAC-SHA256 4850803.74k

and

[root@emer QAT_Engine-1.6.0]# taskset -c 1-8 openssl speed -elapsed -evp aes-128-cbc-hmac-sha256 -async_jobs 48 -multi 8 --engine qatengine ... ... ... ... Got: +H:16:64:256:1024:8192:16384 from 7 AES-128-CBC-HMAC-SHA256 43087.70k 177992.30k 736275.11k 2915154.60k 4146301.61k 5514980.01k

[root@emer QAT_Engine-1.6.0]# taskset -c 1-8 openssl speed -elapsed -evp aes-128-cbc-hmac-sha256 -async_jobs 48 -multi 8 --engine qatengine --bytes 4096 ... ... ... ... Got: +H:4096 from 0 AES-128-CBC-HMAC-SHA256 6092023.13k

venkatesh6911 commented 1 week ago

Thanks for raising the issue @Wang-Robot . I am looking into this and get back to you soon.

Wang-Robot commented 6 days ago

Thank you, looking forward to your latest progress @venkatesh6911

venkatesh6911 commented 4 days ago

There was a discussion on this and it got the feedback that, for 8K and 16K packet sizes, the code path in the QAT firmware is not optimal and it is only generic. That is why we see the underperformance. This has nothing to do with QAT Engine though.

Wang-Robot commented 3 days ago

Thanks. According to our understanding, the larger the packet length, the better the performance. Do we have any special optimization for 4K packet length? In addition, what does QAT firmware mean, qatlib?