intel / QAT_Engine

Intel QuickAssist Technology( QAT) OpenSSL Engine (an OpenSSL Plug-In Engine) which provides cryptographic acceleration for both hardware and optimized software using Intel QuickAssist Technology enabled Intel platforms. https://developer.intel.com/quickassist
BSD 3-Clause "New" or "Revised" License
405 stars 127 forks source link

Performance issue with aes-128-cbc-hmac-sha1 on Intel QAT8950 #80

Open jingshao2017 opened 6 years ago

jingshao2017 commented 6 years ago

We are trying to evaluate the Intel QAT8950 card for crypto performance on a Ubuntu18 based system (Dual E5-2687W with 128GB RAM)

We had successfully built

We were able to run successfully all the tests described in the "Intel Quick Assist Technology & OpenSSL 1.1.0: Performance" after setting up the proper configuration file.

However, while the performance goes up dramatically for the rsa2048, ecdsap256 and ecdhp256 for the QAT engine in asynchronous mode, the performance actually drops for the aes-128-cbc-hmac-sha1.

I have attached my test result vs the result form the Intel white paper in the PDF file.

QAT Test.pdf

Any pointer will be greatly appreciated.

Sam

stevelinsell commented 6 years ago

Hi Sam,

Can you confirm that the results you are recording are for 16KB TLS records, aka the far right column of the results speed produces. The results you are seeing seem very low for SW, Sync, and Async hence why I need to confirm you're taking the numbers from the correct column. The other gotcha is the units the results are displayed as in speed. They are in 1000's of bytes per second so to get to bits per second you need to multiply by 1000 and then multiply by 8, then you can convert to Gbps. Looking at your numbers neither of the things I've pointed out would appear to account for the numbers you are seeing, but I think it's a good starting point to ensure the numbers you are recording are correct before we look at other possibilities.

Kind Regards,

Steve.

jingshao2017 commented 6 years ago

Steve,

Thanks for your help. Yes! I have taken the output for 16KB records, but I have not converted to Gb/s while the output of the "openssl speed" is really GB/s.

When multi=2 is specified, the performance does not go up while it doubles in the Intel white paper.

Here is the raw output from my command:

========================= /home/admin/openssl/apps/openssl speed -engine qat -elapsed -async_jobs 64 -evp aes-128-cbc-hmac-sha1

engine "qat" set. You have chosen to measure elapsed time instead of user CPU time. Doing aes-128-cbc-hmac-sha1 for 3s on 16 size blocks: 24036531 aes-128-cbc-hmac-sha1's in 3.00s Doing aes-128-cbc-hmac-sha1 for 3s on 64 size blocks: 10925885 aes-128-cbc-hmac-sha1's in 3.00s Doing aes-128-cbc-hmac-sha1 for 3s on 256 size blocks: 4534066 aes-128-cbc-hmac-sha1's in 3.00s Doing aes-128-cbc-hmac-sha1 for 3s on 1024 size blocks: 1407720 aes-128-cbc-hmac-sha1's in 3.00s Doing aes-128-cbc-hmac-sha1 for 3s on 8192 size blocks: 400438 aes-128-cbc-hmac-sha1's in 3.00s Doing aes-128-cbc-hmac-sha1 for 3s on 16384 size blocks: 202121 aes-128-cbc-hmac-sha1's in 3.00s OpenSSL 1.1.0i-dev xx XXX xxxx built on: reproducible build, date unspecified options:bn(64,64) rc4(16x,int) des(int) aes(partial) idea(int) blowfish(ptr) compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/ssl/ssl\"" -DENGINESDIR="\"/usr/local/ssl/lib/engines-1.1\"" -Wa,--noexecstack The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes aes-128-cbc-hmac-sha1 128194.83k 233085.55k 386906.97k 480501.76k 1093462.70k 1103850.15k

/home/admin/openssl/apps/openssl speed -engine qat -elapsed -async_jobs 64 -multi 2 -evp aes-128-cbc-hmac-sha1

Forked child 0 Forked child 1 engine "qat" set. engine "qat" set. +DT:aes-128-cbc-hmac-sha1:3:16 +DT:aes-128-cbc-hmac-sha1:3:16 +R:24380245:aes-128-cbc-hmac-sha1:3.000000 +DT:aes-128-cbc-hmac-sha1:3:64 +R:24379908:aes-128-cbc-hmac-sha1:3.000000 +DT:aes-128-cbc-hmac-sha1:3:64 +R:10888891:aes-128-cbc-hmac-sha1:3.000000 +DT:aes-128-cbc-hmac-sha1:3:256 +R:10892735:aes-128-cbc-hmac-sha1:3.000000 +DT:aes-128-cbc-hmac-sha1:3:256 +R:4518878:aes-128-cbc-hmac-sha1:3.000000 +DT:aes-128-cbc-hmac-sha1:3:1024 +R:4520215:aes-128-cbc-hmac-sha1:3.000000 +DT:aes-128-cbc-hmac-sha1:3:1024 +R:1403034:aes-128-cbc-hmac-sha1:3.000000 +DT:aes-128-cbc-hmac-sha1:3:8192 +R:1403421:aes-128-cbc-hmac-sha1:3.000000 +DT:aes-128-cbc-hmac-sha1:3:8192 +R:181311:aes-128-cbc-hmac-sha1:3.000000 +DT:aes-128-cbc-hmac-sha1:3:16384 +R:183230:aes-128-cbc-hmac-sha1:3.000000 +DT:aes-128-cbc-hmac-sha1:3:16384 +R:88894:aes-128-cbc-hmac-sha1:3.010000 +R:88264:aes-128-cbc-hmac-sha1:3.010000 Got: +H:16:64:256:1024:8192:16384 from 0 Got: +F:22:aes-128-cbc-hmac-sha1:130026176.00:232378346.67:385725013.33:479034368.00:500340053.33:483866875.75 from 0 Got: +H:16:64:256:1024:8192:16384 from 1 Got: +F:22:aes-128-cbc-hmac-sha1:130027973.33:232296341.33:385610922.67:478902272.00:495099904.00:480437666.45 from 1 OpenSSL 1.1.0i-dev xx XXX xxxx built on: reproducible build, date unspecified options:bn(64,64) rc4(16x,int) des(int) aes(partial) idea(int) blowfish(ptr) compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/ssl/ssl\"" -DENGINESDIR="\"/usr/local/ssl/lib/engines-1.1\"" -Wa,--noexecstack evp 260054.15k 464674.69k 771335.94k 957936.64k 995439.96k 964304.54k