intel / QAT_Engine

Intel QuickAssist Technology( QAT) OpenSSL Engine (an OpenSSL Plug-In Engine) which provides cryptographic acceleration for both hardware and optimized software using Intel QuickAssist Technology enabled Intel platforms. https://developer.intel.com/quickassist
BSD 3-Clause "New" or "Revised" License
410 stars 128 forks source link

performance problem with cipher AES128-GCM-SHA256 #289

Open vitalyk-radware opened 11 months ago

vitalyk-radware commented 11 months ago

hi, please advise about following issue.

we see that our server spends much time in function RSAZ_1024_mod_exp_avx2() . below is backtrace of this call. I would like to ask , why despite having qatprovider, this RSA crypto operation is done in software ?

here are server and client commands:

export P=/disk/opt/openssl-3.1.4
export LD_LIBRARY_PATH=$P/lib64:/disk/qat/drvlibs
$P/bin/openssl s_server -key /disk/server.pem.key -cert /disk/server.pem.cert  -provider qatprovider

openssl s_client -connect localhost:4433  -tls1_2  -cipher AES128-GCM-SHA256

openssl server is 3.1.4 , qatengine 1.4.0

#0  rsaz_1024_gather5_avx2 () at crypto/bn/rsaz-avx2.s:1601
#1  0x0000000001acf6c1 in RSAZ_1024_mod_exp_avx2 (result_norm=0x9d89cb0, base_norm=<optimized out>, exponent=0xb0b0210, m_norm=0xb0b1640, RR=<optimized out>, k0=13195457688041533433)
    at crypto/bn/rsaz_exp.c:224
#2  0x0000000001ac44ba in BN_mod_exp_mont_consttime (rr=rr@entry=0xad8a8b0, a=a@entry=0xad8a8b0, p=p@entry=0xb0adfb0, m=m@entry=0xb0ad890, ctx=ctx@entry=0xb483170, 
    in_mont=in_mont@entry=0xad96de0) at crypto/bn/bn_exp.c:743
#3  0x0000000001ac571c in BN_mod_exp_mont_consttime_x2 (rr1=rr1@entry=0xad8a8e0, a1=a1@entry=0xad8a8e0, p1=0xb0adf30, m1=<optimized out>, in_mont1=0xb0b0010, rr2=rr2@entry=0xad8a8b0, 
    a2=a2@entry=0xad8a8b0, p2=0xb0adfb0, m2=0xb0ad890, in_mont2=0xad96de0, ctx=ctx@entry=0xb483170) at crypto/bn/bn_exp.c:1620
#4  0x0000000001b8ed29 in rsa_ossl_mod_exp (r0=0xad8a898, I=0xad8a880, rsa=0xb0aca30, ctx=0xb483170) at crypto/rsa/rsa_ossl.c:705
#5  0x0000000001b8f3bf in rsa_ossl_private_decrypt (flen=256, from=0xb9dc456 "\205\v;p\242h\347\212\335c\365U\270Tg\300", to=0xb0b07b0 "\260\036\v\v", rsa=0xb0aca30, padding=3)
    at crypto/rsa/rsa_ossl.c:452
#6  0x0000000001be8089 in rsa_decrypt (vprsactx=0xb4ae8f0, out=0xb47c1e0 "P\350\377\n", outlen=0xa228e90, outsize=48, in=0xb9dc456 "\205\v;p\242h\347\212\335c\365U\270Tg\300", 
    inlen=256) at providers/implementations/asymciphers/rsa_enc.c:240
#7  0x0000000001a979aa in tls_process_cke_rsa (pkt=<optimized out>, s=0xb9f6e60) at ssl/statem/statem_srvr.c:2962
#8  tls_process_client_key_exchange (s=0xb9f6e60, pkt=<optimized out>) at ssl/statem/statem_srvr.c:3344
#9  0x0000000001a831d8 in read_state_machine (s=0xb9f6e60) at ssl/statem/statem.c:666
#10 state_machine (s=0xb9f6e60, server=1) at ssl/statem/statem.c:452
#11 0x0000000001a71489 in ssl3_read_bytes (s=0xb9f6e60, type=23, recvd_type=0x0, buf=0xafe5ac0 "\200.\243\v", len=65536, peek=0, readbytes=0xb9f80d0) at ssl/record/rec_layer_s3.c:1335
#12 0x0000000001a47531 in ssl3_read_internal (readbytes=0xb9f80d0, peek=0, len=65536, buf=0xafe5ac0, s=0xb9f6e60) at ssl/s3_lib.c:4462
#13 ssl3_read (s=0xb9f6e60, buf=0xafe5ac0, len=65536, readbytes=0xb9f80d0) at ssl/s3_lib.c:4486
#14 0x0000000001ab4847 in async_start_func () at crypto/async/async.c:243
#15 0x00007ffff6afc310 in ?? () from /lib/x86_64-linux-gnu/libc.so.6

we do not always get to RSAZ_1024_mod_exp_avx2 call, but i see this flow :

rsa_decrypt -> RSA_private_decrypt -> rsa_ossl_private_decrypt -> rsa_ossl_mod_exp -> BN_mod_exp_mont_consttime_x2 -> BN_mod_exp_mont_consttime

so, please let me know, why we entered rsa_ossl_private_decrypt ? why not some qat rsa function?

thank you in advance, Vitaly

below are details how openssl and qat were made:

  V=openssl-3.1.4
  ./config --prefix=/disk/opt/$V/ --openssldir=/disk/opt/$V    --debug
   make -j22
   make install

   cd QAT_Engine-1.4.0
   ./autogen.sh
   export MYCT100=/home/vitalyk/git/Ma/
   ./configure --prefix=/disk/opt/openssl31/ --with-openssl_install_dir=/disk/opt/$V/ \
                                             --with-openssl_dir=/home/vitalyk/OPS31/$V \
                                             --with-qat_hw_dir=$MYCT100/3rd_party/qat3/qat/QAT1.7  --enable-qat-debug \
                                                --with-cc_opt=-DQAT_TESTS_LOG \
                                                 --enable-qat_provider  \
                           --enable-upstream_driver --enable-usdm --disable-qat_hw_lenstra_protection \
                        --enable-qat_insecure_algorithms --disable-qat_hw_ciphers

output of s_server :

 OpenSSL 3.1.4 24 Oct 2023 (Library: OpenSSL 3.1.4 24 Oct 2023)
[DEBUG][254816.981837] PID [20181] Thread [7ffff7ff1700][qat_prov_init.c:534:qat_get_params_from_core()] get_params is NULL. Using the default params
[WARN][254816.984378] PID [20181] Thread [7ffff7ff1700][e_qat.c:1055:bind_qat()] QAT Warnings enabled.
[DEBUG][254816.984446] PID [20181] Thread [7ffff7ff1700][e_qat.c:1056:bind_qat()] QAT Debug enabled.
[WARN][254816.984492] PID [20181] Thread [7ffff7ff1700][e_qat.c:1057:bind_qat()] (null) - Reference implementation of QAT crypto engine(qat_hw) v1.4.0
[DEBUG][254816.984925] PID [20181] Thread [7ffff7ff1700][e_qat.c:1074:bind_qat()] QAT HW device available
QAT_HW RSA for Provider Enabled
QAT_HW ECDSA for Provider Enabled
QAT_HW ECDH for Provider Enabled
QAT_HW DSA for Provider Enabled
QAT_HW DH for Provider Enabled
QAT_HW ECX25519 for Provider Enabled
QAT_HW ECX448 for Provider Enabled
QAT_HW PRF for Provider Enabled
[DEBUG][254816.985432] PID [20181] Thread [7ffff7ff1700][e_qat.c:591:qat_engine_init()] QAT Engine initialization:
[DEBUG][254816.985498] PID [20181] Thread [7ffff7ff1700][qat_hw_init.c:590:qat_hw_init()] QAT_HW initialization:
[DEBUG][254816.985569] PID [20181] Thread [7ffff7ff1700][qat_hw_init.c:591:qat_hw_init()] - External polling: OFF
[DEBUG][254816.985623] PID [20181] Thread [7ffff7ff1700][qat_hw_init.c:592:qat_hw_init()] - Heuristic polling: OFF
[DEBUG][254816.985676] PID [20181] Thread [7ffff7ff1700][qat_hw_init.c:593:qat_hw_init()] - SW Fallback: OFF
[DEBUG][254816.985750] PID [20181] Thread [7ffff7ff1700][qat_hw_init.c:594:qat_hw_init()] - Inline polling: OFF
[DEBUG][254816.985814] PID [20181] Thread [7ffff7ff1700][qat_hw_init.c:595:qat_hw_init()] - Internal poll interval: 10000ns
[DEBUG][254816.985867] PID [20181] Thread [7ffff7ff1700][qat_hw_init.c:596:qat_hw_init()] - Epoll timeout: 1000ms
[DEBUG][254816.985928] PID [20181] Thread [7ffff7ff1700][qat_hw_init.c:597:qat_hw_init()] - Event driven polling mode: OFF
[DEBUG][254816.985976] PID [20181] Thread [7ffff7ff1700][qat_hw_init.c:598:qat_hw_init()] - Instance for thread: OFF
[DEBUG][254816.986045] PID [20181] Thread [7ffff7ff1700][qat_hw_init.c:599:qat_hw_init()] - Max retry count: 5
[DEBUG][254817.040035] PID [20181] Thread [7ffff7ff1700][qat_hw_init.c:638:qat_hw_init()] Found 1 Cy instances
[DEBUG][254817.040244] PID [20181] Thread [7ffff7ff1700][qat_hw_init.c:753:qat_hw_init()] Started Instance No: 0 Located on Device: 0
Asymmetric instances sequence: 0
Symmetric instances sequence: 0
[DEBUG][254817.045025] PID [20181] Thread [7ffff5ed2700][qat_hw_polling.c:148:qat_timer_poll_func()] timer_poll_func started
[DEBUG][254817.054572] PID [20181] Thread [7ffff5ed2700][qat_hw_polling.c:160:qat_timer_poll_func()] qat_timer_poll_func_thread = 0x7ffff5ed2700
Using default temp DH parameters
ACCEPT
[DEBUG][254844.562929] PID [20181] Thread [7ffff7ff1700][qat_hw_prf.c:660:qat_prf_tls_derive()] QAT HW PRF Started
[DEBUG][254844.563003] PID [20181] Thread [7ffff7ff1700][qat_hw_prf.c:498:build_tls_prf_op_data()] Value of label = master secret
=========================
PRF Op Data: 0x7fffffffd9a0
tlsOp: MASTER_SECRET_DERIVE
qat_prf_tls_derive: Secret: Length 48, Address 0x7ffff5ee3c00
03 03 8f b8 da ba 5e a7 - 28 77 d7 3c 58 d4 f5 37
a9 d8 a4 de c9 ee 81 e9 - 16 54 d2 89 8b 22 2d c0
fd be 31 de e9 44 57 cd - 89 e1 3d d8 82 d1 de 37
qat_prf_tls_derive: Seed: Length 64, Address 0x7ffff5ee4400
65 6b eb af e3 79 42 13 - f4 95 45 bf d1 d4 b1 f1
1f 80 5c 9d d0 47 4e 30 - 2d e4 3a ab 12 ea 9a 8b
17 56 0e b3 86 ec ed 60 - 50 c4 f5 fc 9e 01 36 f3
eb 38 8a 86 b7 6d 78 1b - 44 4f 57 4e 47 52 44 01
qat_prf_tls_derive: User Label: Length 0, Address (nil)
=========================
[DEBUG][254844.570193] PID [20181] Thread [7ffff7ff1700][qat_hw_init.c:326:get_next_inst_num()] inst type: SYM, inst_num = 0
=========================
TLS Keygen Request:
instance_handle = 0x70f020
qat_prf_tls_derive: generated_key->pData: Length 48, Address 0x7ffff5ee4c00
00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00
=========================
[DEBUG][254844.573378] PID [20181] Thread [7ffff7ff1700][qat_hw_prf.c:782:qat_prf_tls_derive()] Calling cpaCyKeyGenTls2
[DEBUG][254844.573481] PID [20181] Thread [7ffff5ed2700][qat_hw_callback.c:199:qat_crypto_callbackFn()] status 0 verifyResult 1
=========================
TLS Keygen Output: 0x7ffff5ee4800
qat_prf_tls_derive: generated_key->pData: Length 48, Address 0x7ffff5ee4c00
f7 33 e9 a6 87 1e f9 a1 - d7 5c 8c 1f 6b be 99 83
31 62 7e 33 ca 60 ac 57 - 0c 76 6d 4d 59 06 00 4d
f0 88 ce cf de 70 8d ab - 45 b0 db ec 2f 2f a8 85
=========================
qat_prf_tls_derive: Generated key: Length 48, Address 0x7ffff5ee4c00
f7 33 e9 a6 87 1e f9 a1 - d7 5c 8c 1f 6b be 99 83
31 62 7e 33 ca 60 ac 57 - 0c 76 6d 4d 59 06 00 4d
f0 88 ce cf de 70 8d ab - 45 b0 db ec 2f 2f a8 85
[DEBUG][254844.582105] PID [20181] Thread [7ffff7ff1700][qat_hw_prf.c:660:qat_prf_tls_derive()] QAT HW PRF Started
[DEBUG][254844.582174] PID [20181] Thread [7ffff7ff1700][qat_hw_prf.c:498:build_tls_prf_op_data()] Value of label = key expansion
=========================
PRF Op Data: 0x7fffffffdb10
tlsOp: KEY_MATERIAL_DERIVE
qat_prf_tls_derive: Secret: Length 48, Address 0x7ffff5ee3c00
f7 33 e9 a6 87 1e f9 a1 - d7 5c 8c 1f 6b be 99 83
31 62 7e 33 ca 60 ac 57 - 0c 76 6d 4d 59 06 00 4d
f0 88 ce cf de 70 8d ab - 45 b0 db ec 2f 2f a8 85
qat_prf_tls_derive: Seed: Length 64, Address 0x7ffff5ee4400
17 56 0e b3 86 ec ed 60 - 50 c4 f5 fc 9e 01 36 f3
eb 38 8a 86 b7 6d 78 1b - 44 4f 57 4e 47 52 44 01
65 6b eb af e3 79 42 13 - f4 95 45 bf d1 d4 b1 f1
1f 80 5c 9d d0 47 4e 30 - 2d e4 3a ab 12 ea 9a 8b
qat_prf_tls_derive: User Label: Length 0, Address (nil)
=========================
[DEBUG][254844.585327] PID [20181] Thread [7ffff7ff1700][qat_hw_init.c:326:get_next_inst_num()] inst type: SYM, inst_num = 0
=========================
TLS Keygen Request:
instance_handle = 0x70f020
qat_prf_tls_derive: generated_key->pData: Length 40, Address 0x7ffff5ee4c00
00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00
=========================
[DEBUG][254844.586136] PID [20181] Thread [7ffff7ff1700][qat_hw_prf.c:782:qat_prf_tls_derive()] Calling cpaCyKeyGenTls2
[DEBUG][254844.586260] PID [20181] Thread [7ffff5ed2700][qat_hw_callback.c:199:qat_crypto_callbackFn()] status 0 verifyResult 1
=========================
TLS Keygen Output: 0x7ffff5ee4800
qat_prf_tls_derive: generated_key->pData: Length 40, Address 0x7ffff5ee4c00
68 2c 67 f3 19 be 04 3b - 90 45 25 05 e2 7d 02 b4
2e 87 3c 6c 29 ce e5 ad - e9 ee 2c 15 6e f2 41 ac
cf 59 4c 6c 13 99 bf 04
=========================
qat_prf_tls_derive: Generated key: Length 40, Address 0x7ffff5ee4c00
68 2c 67 f3 19 be 04 3b - 90 45 25 05 e2 7d 02 b4
2e 87 3c 6c 29 ce e5 ad - e9 ee 2c 15 6e f2 41 ac
cf 59 4c 6c 13 99 bf 04
[DEBUG][254844.590275] PID [20181] Thread [7ffff7ff1700][qat_hw_prf.c:660:qat_prf_tls_derive()] QAT HW PRF Started
[DEBUG][254844.590345] PID [20181] Thread [7ffff7ff1700][qat_hw_prf.c:498:build_tls_prf_op_data()] Value of label = client finished
=========================
PRF Op Data: 0x7fffffffdb60
tlsOp: CLIENT_FINISHED_DERIVE
qat_prf_tls_derive: Secret: Length 48, Address 0x7ffff5ee3c00
f7 33 e9 a6 87 1e f9 a1 - d7 5c 8c 1f 6b be 99 83
31 62 7e 33 ca 60 ac 57 - 0c 76 6d 4d 59 06 00 4d
f0 88 ce cf de 70 8d ab - 45 b0 db ec 2f 2f a8 85
qat_prf_tls_derive: Seed: Length 32, Address 0x7ffff5ee4400
fa 44 ac 84 eb f6 c0 d3 - 32 e9 50 a8 bb 75 bf d2
e3 c4 fc 18 78 2b 0d 78 - 63 a4 3d a7 9f b1 a2 ae
qat_prf_tls_derive: User Label: Length 0, Address (nil)
=========================
[DEBUG][254844.591973] PID [20181] Thread [7ffff7ff1700][qat_hw_init.c:326:get_next_inst_num()] inst type: SYM, inst_num = 0
=========================
TLS Keygen Request:
instance_handle = 0x70f020
qat_prf_tls_derive: generated_key->pData: Length 12, Address 0x7ffff5ee4c00
00 00 00 00 00 00 00 00 - 00 00 00 00
=========================
[DEBUG][254844.592231] PID [20181] Thread [7ffff7ff1700][qat_hw_prf.c:782:qat_prf_tls_derive()] Calling cpaCyKeyGenTls2
[DEBUG][254844.592274] PID [20181] Thread [7ffff5ed2700][qat_hw_callback.c:199:qat_crypto_callbackFn()] status 0 verifyResult 1
=========================
TLS Keygen Output: 0x7ffff5ee4800
qat_prf_tls_derive: generated_key->pData: Length 12, Address 0x7ffff5ee4c00
db 22 b3 6a 7a 69 c3 b5 - 47 f8 c7 2c
=========================
qat_prf_tls_derive: Generated key: Length 12, Address 0x7ffff5ee4c00
db 22 b3 6a 7a 69 c3 b5 - 47 f8 c7 2c
[DEBUG][254844.594081] PID [20181] Thread [7ffff7ff1700][qat_hw_prf.c:660:qat_prf_tls_derive()] QAT HW PRF Started
[DEBUG][254844.594144] PID [20181] Thread [7ffff7ff1700][qat_hw_prf.c:498:build_tls_prf_op_data()] Value of label = server finished
=========================
PRF Op Data: 0x7fffffffdb70
tlsOp: SERVER_FINISHED_DERIVE
qat_prf_tls_derive: Secret: Length 48, Address 0x7ffff5ee3c00
f7 33 e9 a6 87 1e f9 a1 - d7 5c 8c 1f 6b be 99 83
31 62 7e 33 ca 60 ac 57 - 0c 76 6d 4d 59 06 00 4d
f0 88 ce cf de 70 8d ab - 45 b0 db ec 2f 2f a8 85
qat_prf_tls_derive: Seed: Length 32, Address 0x7ffff5ee4400
22 e1 4a 50 3c 8f b0 f1 - 73 7d 3f f2 06 12 9b f5
de 9b ab 5d 7f 62 b6 48 - 1d 3c 12 c4 88 6a 7a b4
qat_prf_tls_derive: User Label: Length 0, Address (nil)
=========================
[DEBUG][254844.595966] PID [20181] Thread [7ffff7ff1700][qat_hw_init.c:326:get_next_inst_num()] inst type: SYM, inst_num = 0
=========================
TLS Keygen Request:
instance_handle = 0x70f020
qat_prf_tls_derive: generated_key->pData: Length 12, Address 0x7ffff5ee4c00
00 00 00 00 00 00 00 00 - 00 00 00 00
=========================
[DEBUG][254844.596230] PID [20181] Thread [7ffff7ff1700][qat_hw_prf.c:782:qat_prf_tls_derive()] Calling cpaCyKeyGenTls2
[DEBUG][254844.596274] PID [20181] Thread [7ffff5ed2700][qat_hw_callback.c:199:qat_crypto_callbackFn()] status 0 verifyResult 1
=========================
TLS Keygen Output: 0x7ffff5ee4800
qat_prf_tls_derive: generated_key->pData: Length 12, Address 0x7ffff5ee4c00
ad 99 a8 48 1d 04 d3 05 - 2c 1d a6 24
=========================
qat_prf_tls_derive: Generated key: Length 12, Address 0x7ffff5ee4c00
ad 99 a8 48 1d 04 d3 05 - 2c 1d a6 24
Yogaraj-Alamenda commented 11 months ago

@vitalyk-radware I understand your are using a RSA certificate, RSA Encrypt and Decrypt is not supported in provider unlike engine and only RSA sign and verify operation is supported in provider. Can you confirm your certificate details ?

vitalyk-radware commented 11 months ago

yes, the certificate is RSA. can you please tell why rsa_decrypt is not part of Provider ? is it a limitation of Provider by definition ? obviously , it affects performance... how we can overcome this problem? thank you!

Yogaraj-Alamenda commented 11 months ago

@vitalyk-radware I don't remember any limitation of Provider. The QAT provider needs additional changes to support this encrypt or decrypt which we could not get it included. I will check and get back to you to see if we can include this support into the upcoming releases later. Feel Free to raise PR if you have any changes to support this.