tempesta-tech / tempesta

All-in-one solution for high performance web content delivery and advanced protection against DDoS and web attacks
https://tempesta-tech.com/
GNU General Public License v2.0
614 stars 103 forks source link

Crypto extensions and performance #1335

Open krizhanovsky opened 5 years ago

krizhanovsky commented 5 years ago

Scope

Following algorithms must be implemented or optimized in Tempesta TLS:

Testing

Notes

Deprecation of SECP 384

SECP 384 technically a legacy and x448 provides better performance (checked w/ OpenSSL):

$ openssl speed ecdsa
                              sign    verify    sign/s verify/s
 224 bits ecdsa (nistp224)   0.0001s   0.0001s  14928.8   6707.9
 256 bits ecdsa (nistp256)   0.0000s   0.0001s  35504.2  11838.0
 384 bits ecdsa (nistp384)   0.0011s   0.0009s    890.6   1079.1
 521 bits ecdsa (nistp521)   0.0004s   0.0007s   2770.6   1401.8
$ openssl speed eddsa
                              sign    verify    sign/s verify/s
 253 bits EdDSA (Ed25519)   0.0001s   0.0001s  19837.8   7459.3
 456 bits EdDSA (Ed448)   0.0004s   0.0007s   2657.7   1482.6

It seems that OpenSSL doesn't optimize the curve at all, since even 521 has better performance. However, CA/B Forum Baseline Requirements section 6.1.5 requires certificates to be signed with either RSA or NIST curves of 256, 384 or 521. Let's leave RSA for the legacy usage and remove secp384 completely. Also note that ECDSA secp256 outperforms Ed25519 for signing, so we should leave secp256 to support EC certificates. ECDHE is faster for x25519:

$ openssl speed ecdh
                              op      op/s
 224 bits ecdh (nistp224)   0.0001s  11621.8
 256 bits ecdh (nistp256)   0.0001s  16690.9
 384 bits ecdh (nistp384)   0.0011s    915.1
 521 bits ecdh (nistp521)   0.0004s   2265.4
 253 bits ecdh (X25519)   0.0000s  24055.1
 448 bits ecdh (X448)   0.0006s   1612.5

AES-GCM precomputations for Karatsuba multiplication

The paper TLS performance characterization on modern x86 CPUs references two original Intel papers:

The header comments for the Linux implementation explicitly says that it was developed by these two papers. The first one mentions hash key precomputations: Htbl in OpenSSL crypto/modes/asm/ghash-x86_64.pl and HashKey* offsets in linux/arch/x86/crypto/aesni-intel_avx-x86_64.S, so these precomputations are used in both the implementations. The second one proposes to precompute carry-less multiplication of Bh and Bl parts in Karatsuba multiplication. There is also Intel paper Carry-Less Multiplication Instruction and its Usage for Computing the GCM Mode, which doesn't consider the precomputation optimizations.

krizhanovsky commented 2 months ago

Updated benchmarks for ECDSA (performance core on i9-12900HK):

$ openssl version
OpenSSL 3.0.2 15 Mar 2022 (Library: OpenSSL 3.0.2 15 Mar 2022)

$ taskset --cpu-list 2 openssl speed ecdsa
....
compiler: gcc -fPIC -pthread -m64 -Wa,--noexecstack -Wall -Wa,--noexecstack -g -O2 -ffile-prefix-map=/build/openssl-olCZw9/openssl-3.0.2=. -flto=auto -ffat-lto-objects -flto=auto -ffat-lto-objects -fstack-protector-strong -Wformat -Werror=format-security -DOPENSSL_TLS_SECURITY_LEVEL=2 -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_BUILDING_OPENSSL -DNDEBUG -Wdate-time -D_FORTIFY_SOURCE=2
CPUINFO: OPENSSL_ia32cap=0x7ffaf3ffffebffff:0x98c007bc239ca7eb
                              sign    verify    sign/s verify/s
 160 bits ecdsa (secp160r1)   0.0001s   0.0001s  10476.5   9825.9
 192 bits ecdsa (nistp192)   0.0001s   0.0001s   8490.5   8223.9
 224 bits ecdsa (nistp224)   0.0000s   0.0001s  34613.6  16034.2
 256 bits ecdsa (nistp256)   0.0000s   0.0000s  63743.3  20365.9
 384 bits ecdsa (nistp384)   0.0005s   0.0004s   2097.6   2455.4
 521 bits ecdsa (nistp521)   0.0002s   0.0003s   5812.7   2880.5
 163 bits ecdsa (nistk163)   0.0001s   0.0002s   8635.0   4368.9
 233 bits ecdsa (nistk233)   0.0002s   0.0003s   6390.0   3231.3
 283 bits ecdsa (nistk283)   0.0003s   0.0006s   3569.2   1808.5
 409 bits ecdsa (nistk409)   0.0005s   0.0010s   2060.4   1051.3
 571 bits ecdsa (nistk571)   0.0011s   0.0021s    924.0    470.3
 163 bits ecdsa (nistb163)   0.0001s   0.0002s   8257.3   4171.4
 233 bits ecdsa (nistb233)   0.0002s   0.0003s   6005.8   3078.1
 283 bits ecdsa (nistb283)   0.0003s   0.0006s   3367.1   1706.5
 409 bits ecdsa (nistb409)   0.0005s   0.0010s   1946.0    989.7
 571 bits ecdsa (nistb571)   0.0012s   0.0023s    858.7    437.6
 256 bits ecdsa (brainpoolP256r1)   0.0002s   0.0002s   4942.6   4953.8
 256 bits ecdsa (brainpoolP256t1)   0.0002s   0.0002s   4939.3   5119.0
 384 bits ecdsa (brainpoolP384r1)   0.0005s   0.0004s   2080.8   2338.1
 384 bits ecdsa (brainpoolP384t1)   0.0005s   0.0004s   2113.9   2474.1
 512 bits ecdsa (brainpoolP512r1)   0.0008s   0.0007s   1243.8   1458.2
 512 bits ecdsa (brainpoolP512t1)   0.0008s   0.0006s   1267.2   1556.7

(Results are basically the same).