intel / QAT_Engine

Intel QuickAssist Technology( QAT) OpenSSL Engine (an OpenSSL Plug-In Engine) which provides cryptographic acceleration for both hardware and optimized software using Intel QuickAssist Technology enabled Intel platforms. https://developer.intel.com/quickassist
BSD 3-Clause "New" or "Revised" License
389 stars 123 forks source link

AES GCM (128Bit and 256Bit) benefit with qat? #266

Open sferlin opened 11 months ago

sferlin commented 11 months ago

Trying to follow the instructions on this Intel reference page, and reproduce results reported by Intel with AES GCM having a (roughly) factor 2x over the baseline for 8kB blocks (without QAT) without success:

Is there any CPU or environment-related, i.e., openssl, setting missing? Or is it just the cipher no longer being implemented with/for QAT (as some other replies to similar issues in this repo hint to)?

I also tried creating a file /etc/sysconfig/qat based on this other Intel QAT reference, with different settings, and no change was observed.

Environment: OS: RHEL 9.2 - Machine 01: qatengine rpm version 1.0.0-1.el9_2, CPU: Intel(R) Xeon(R) Platinum 8462Y+ - Machine 02: QAT_Engine built from this repo, CPU: Intel(R) Xeon(R) Platinum 8480+

QAT: - Machine 01: openssl engine -t -c -v qatengine (qatengine) Reference implementation of QAT crypto engine(qat_hw) v1.0.0
[RSA, AES-128-CBC-HMAC-SHA256, AES-256-CBC-HMAC-SHA256, ChaCha20-Poly1305, SHA3-256, SHA3-384, SHA3-512]
[ available ] ENABLE_EXTERNAL_POLLING, POLL, SET_INSTANCE_FOR_THREAD,
GET_NUM_OP_RETRIES, SET_MAX_RETRY_COUNT, SET_INTERNAL_POLL_INTERVAL,
GET_EXTERNAL_POLLING_FD, ENABLE_EVENT_DRIVEN_POLLING_MODE,
GET_NUM_CRYPTO_INSTANCES, DISABLE_EVENT_DRIVEN_POLLING_MODE,
SET_EPOLL_TIMEOUT, SET_CRYPTO_SMALL_PACKET_OFFLOAD_THRESHOLD,
ENABLE_INLINE_POLLING, ENABLE_HEURISTIC_POLLING,
GET_NUM_REQUESTS_IN_FLIGHT, INIT_ENGINE, SET_CONFIGURATION_SECTION_NAME,
ENABLE_SW_FALLBACK, HEARTBEAT_POLL, DISABLE_QAT_OFFLOAD, HW_ALGO_BITMAP - Machine 02: openssl engine -t -c -v qatengine (qatengine) Reference implementation of QAT crypto engine(qat_hw) v1.2.0 [RSA, AES-128-CBC-HMAC-SHA256, AES-256-CBC-HMAC-SHA256, ChaCha20-Poly1305, SHA3-256, SHA3-384, SHA3-512, TLS1-PRF, X25519, X448] [ available ] ENABLE_EXTERNAL_POLLING, POLL, SET_INSTANCE_FOR_THREAD, GET_NUM_OP_RETRIES, SET_MAX_RETRY_COUNT, SET_INTERNAL_POLL_INTERVAL, GET_EXTERNAL_POLLING_FD, ENABLE_EVENT_DRIVEN_POLLING_MODE, GET_NUM_CRYPTO_INSTANCES, DISABLE_EVENT_DRIVEN_POLLING_MODE, SET_EPOLL_TIMEOUT, SET_CRYPTO_SMALL_PACKET_OFFLOAD_THRESHOLD, ENABLE_INLINE_POLLING, ENABLE_HEURISTIC_POLLING, GET_NUM_REQUESTS_IN_FLIGHT, INIT_ENGINE, SET_CONFIGURATION_SECTION_NAME, ENABLE_SW_FALLBACK, HEARTBEAT_POLL, DISABLE_QAT_OFFLOAD, HW_ALGO_BITMAP

Openssl speed tests: - Machine 01:

taskset 0x1 openssl speed -evp aes-128-gcm 
type           16 bytes     64 bytes      256 bytes   1024 bytes   8192 bytes   16384 bytes                                   
AES-128-GCM    1031977.31k  2447358.19k  4960747.43k  6953347.07k  7698314.58k  7777314.93k 
taskset 0x1 openssl speed -engine qatengine -evp aes-128-gcm                                           
type           16 bytes     64 bytes      256 bytes   1024 bytes   8192 bytes   16384 bytes                                   
AES-128-GCM    1034648.31k  2460134.78k  4985995.95k  6963253.93k  7721680.57k  7777047.89k

Also obtained similar, i.e., same values, for aes-256-cbc

- Machine 02:

taskset 0x1 openssl speed -evp aes-128-gcm 
type           16 bytes     64 bytes      256 bytes   1024 bytes   8192 bytes   16384 bytes                                   
AES-128-GCM     943251.65k  2279441.22k  4627250.69k  6456699.12k  7145985.37k  7222886.40k
taskset 0x1 openssl speed -engine qatengine -evp aes-128-gcm     
type           16 bytes     64 bytes      256 bytes   1024 bytes   8192 bytes   16384 bytes                                   
AES-128-GCM     949183.69k  2278880.92k  4637970.26k  6496159.08k  7153216.17k  7228517.03k

Also obtained similar, i.e., same values, for aes-256-cbc

Detailed output: - Machine 01:

taskset 0x1 openssl speed -evp aes-128-gcm 
Doing AES-128-GCM for 3s on 16 size blocks: 192537514 AES-128-GCM's in 2.99s                                                 
Doing AES-128-GCM for 3s on 64 size blocks: 115424354 AES-128-GCM's in 3.00s                                                 
Doing AES-128-GCM for 3s on 256 size blocks: 58472809 AES-128-GCM's in 3.00s                                                 
Doing AES-128-GCM for 3s on 1024 size blocks: 20311076 AES-128-GCM's in 3.00s                                                
Doing AES-128-GCM for 3s on 8192 size blocks: 2818695 AES-128-GCM's in 3.00s                                                 
Doing AES-128-GCM for 3s on 16384 size blocks: 1424309 AES-128-GCM's in 2.99s                                                
version: 3.0.7
built on: Wed Mar  8 00:00:00 2023 UTC 
options: bn(64,64)
compiler: gcc -fPIC -pthread -m64 -Wa,--noexecstack -Wall -O3 -O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-s
witches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redh
at-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -march=x86-64-v2 -mtune=generic -
fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -Wa,--noexecstack -Wa,--generate-missing-build-notes=yes
 -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -DOPENSSL_USE_NODELETE -DL_ENDIA
N -DOPENSSL_PIC -DOPENSSL_BUILDING_OPENSSL -DZLIB -DNDEBUG -DPURIFY -DDEVRANDOM="\"/dev/urandom\"" -DREDHAT_FIPS_VERSION="\"3
.0.7-1f5987c2732dd431\"" -DSYSTEM_CIPHERS_FILE="/etc/crypto-policies/back-ends/openssl.config"                               
CPUINFO: OPENSSL_ia32cap=0x7ffef3ffffebffff:0xfb417ffef3bfb7ef                                                               
The 'numbers' are in 1000s of bytes per second processed.                                                                    
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes                                   
AES-128-GCM    1030301.08k  2462386.22k  4989679.70k  6932847.27k  7696916.48k  7804641.69k 
taskset 0x1 openssl speed -engine qatengine -evp aes-128-gcm                                           
Engine "qatengine" set.                   
Doing AES-128-GCM for 3s on 16 size blocks: 193996559 AES-128-GCM's in 3.00s                                                 
Doing AES-128-GCM for 3s on 64 size blocks: 115318818 AES-128-GCM's in 3.00s                                                 
Doing AES-128-GCM for 3s on 256 size blocks: 58429640 AES-128-GCM's in 3.00s                                                 
Doing AES-128-GCM for 3s on 1024 size blocks: 20400158 AES-128-GCM's in 3.00s                                                
Doing AES-128-GCM for 3s on 8192 size blocks: 2818338 AES-128-GCM's in 2.99s                                                 
Doing AES-128-GCM for 3s on 16384 size blocks: 1424020 AES-128-GCM's in 3.00s                                                
version: 3.0.7                                                                                                               
built on: Wed Mar  8 00:00:00 2023 UTC                                                                                       
options: bn(64,64)
compiler: gcc -fPIC -pthread -m64 -Wa,--noexecstack -Wall -O3 -O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-s
witches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redh·
at-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -march=x86-64-v2 -mtune=generic -
fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -Wa,--noexecstack -Wa,--generate-missing-build-notes=yes
 -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -DOPENSSL_USE_NODELETE -DL_ENDIA
N -DOPENSSL_PIC -DOPENSSL_BUILDING_OPENSSL -DZLIB -DNDEBUG -DPURIFY -DDEVRANDOM="\"/dev/urandom\"" -DREDHAT_FIPS_VERSION="\"3·
.0.7-1f5987c2732dd431\"" -DSYSTEM_CIPHERS_FILE="/etc/crypto-policies/back-ends/openssl.config"                               CPUINFO: OPENSSL_ia32cap=0x7ffef3ffffebffff:0xfb417ffef3bfb7ef                                                               
The 'numbers' are in 1000s of bytes per second processed.                                                                    
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes                                   
AES-128-GCM    1034648.31k  2460134.78k  4985995.95k  6963253.93k  7721680.57k  7777047.89k  

- Machine 02:

taskset 0x1 openssl speed -evp aes-128-gcm
Doing AES-128-GCM for 3s on 16 size blocks: 176270152 AES-128-GCM's in 2.99s
Doing AES-128-GCM for 3s on 64 size blocks: 106848807 AES-128-GCM's in 3.00s
Doing AES-128-GCM for 3s on 256 size blocks: 54225594 AES-128-GCM's in 3.00s
Doing AES-128-GCM for 3s on 1024 size blocks: 18853057 AES-128-GCM's in 2.99s
Doing AES-128-GCM for 3s on 8192 size blocks: 2616938 AES-128-GCM's in 3.00s
Doing AES-128-GCM for 3s on 16384 size blocks: 1322550 AES-128-GCM's in 3.00s
version: 3.0.7
built on: Wed Mar  8 00:00:00 2023 UTC
options: bn(64,64)
compiler: gcc -fPIC -pthread -m64 -Wa,--noexecstack -Wall -O3 -O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -march=x86-64-v2 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -Wa,--noexecstack -Wa,--generate-missing-build-notes=yes -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_BUILDING_OPENSSL -DZLIB -DNDEBUG -DPURIFY -DDEVRANDOM="\"/dev/urandom\"" -DREDHAT_FIPS_VERSION="\"3.0.7-1f5987c2732dd431\"" -DSYSTEM_CIPHERS_FILE="/etc/crypto-policies/back-ends/openssl.config"
CPUINFO: OPENSSL_ia32cap=0x7ffef3ffffebffff:0xfb417ffef3bfb7ef
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
AES-128-GCM     943251.65k  2279441.22k  4627250.69k  6456699.12k  7145985.37k  7222886.40k
taskset 0x1 openssl speed -engine qatengine -evp aes-128-gcm
Engine "qatengine" set.
Doing AES-128-GCM for 3s on 16 size blocks: 177378703 AES-128-GCM's in 2.99s
Doing AES-128-GCM for 3s on 64 size blocks: 106822543 AES-128-GCM's in 3.00s
Doing AES-128-GCM for 3s on 256 size blocks: 54351214 AES-128-GCM's in 3.00s
Doing AES-128-GCM for 3s on 1024 size blocks: 18968277 AES-128-GCM's in 2.99s
Doing AES-128-GCM for 3s on 8192 size blocks: 2619586 AES-128-GCM's in 3.00s
Doing AES-128-GCM for 3s on 16384 size blocks: 1323581 AES-128-GCM's in 3.00s
version: 3.0.7
built on: Wed Mar  8 00:00:00 2023 UTC
options: bn(64,64)
compiler: gcc -fPIC -pthread -m64 -Wa,--noexecstack -Wall -O3 -O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -march=x86-64-v2 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -Wa,--noexecstack -Wa,--generate-missing-build-notes=yes -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_BUILDING_OPENSSL -DZLIB -DNDEBUG -DPURIFY -DDEVRANDOM="\"/dev/urandom\"" -DREDHAT_FIPS_VERSION="\"3.0.7-1f5987c2732dd431\"" -DSYSTEM_CIPHERS_FILE="/etc/crypto-policies/back-ends/openssl.config"
CPUINFO: OPENSSL_ia32cap=0x7ffef3ffffebffff:0xfb417ffef3bfb7ef
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
AES-128-GCM     949183.69k  2278880.92k  4637970.26k  6496159.08k  7153216.17k  7228517.03k

Other observations - testing with other ciphers (RSA2048): Test with 1) openssl speed software, 2) QAT synch, and 3) QAT asynch (-asynch_jobs 8) gives a speed up factor of (sign/s):

openssl speed rsa2048
                  sign    verify    sign/s verify/s
rsa 2048 bits 0.000237s 0.000014s   4212.3  72485.9 

2x:

openssl speed -engine qatengine rsa2048
                  sign    verify    sign/s verify/s 
rsa 2048 bits 0.000124s 0.000017s   8074.0  57805.8 

~10x:

openssl speed -engine qatengine -async_jobs 8 rsa2048
                  sign    verify    sign/s verify/s
rsa 2048 bits 0.000024s 0.000005s  41191.2 206077.2
Yogaraj-Alamenda commented 10 months ago

It seems you have build target for QAT_HW only where the benefit is less compared to OpenSSL_SW. Can you try example 4 of that builds for qat_sw target https://github.com/intel/QAT_Engine#example-builds.

Please remove qatlib rpm if any from the system. From the logs you shared AES-GCM is not enabled