The wolfSSL library is a small, fast, portable implementation of TLS/SSL for embedded devices to the cloud. wolfSSL supports up to TLS 1.3 and DTLS 1.3!
Fix to not allow Shake128/256 with Xilinx AFALG. Cleanup the Shake disable logic to allow forcing off with WOLFSSL_NO_SHAKE128 and WOLFSSL_NO_SHAKE256.
Note 1: Using maximum benchmark block size of 16448 bytes because of issues with AFALG memory on PetaLinux.
Note 2: The bare-metal and FreeRTOS performance of XilSecure is better than with PetaLinux due to driver overhead.
wolf Assembly Speedups for ARMv8
./configure --enable-sp=yes,asm --enable-armasm --enable-keygen --enable-curve25519 --enable-ed25519 --enable-sha3 --disable-shared && make
./wolfcrypt/benchmark/benchmark 16448
./wolfcrypt/benchmark/benchmark -rsa-kg -rsa-sz 3072
./wolfcrypt/benchmark/benchmark -rsa-kg -rsa-sz 4096
./wolfcrypt/benchmark/benchmark -ecc-kg -ecc -p384
./wolfcrypt/benchmark/benchmark -ecc-kg -ecc -p521
------------------------------------------------------------------------------
wolfSSL version 5.7.2
------------------------------------------------------------------------------
Math: Multi-Precision: Wolf(SP) word-size=64 bits=4096 sp_int.c
Single Precision: ecc 256 384 521 rsa/dh 2048 3072 4096 asm sp_arm64.c
wolfCrypt Benchmark (block bytes 16448, min 1.0 sec each)
RNG 90 MiB took 1.054 seconds, 85.219 MiB/s
AES-128-CBC-enc 693 MiB took 1.003 seconds, 691.499 MiB/s
AES-128-CBC-dec 643 MiB took 1.001 seconds, 643.136 MiB/s
AES-192-CBC-enc 604 MiB took 1.005 seconds, 600.615 MiB/s
AES-192-CBC-dec 584 MiB took 1.003 seconds, 581.590 MiB/s
AES-256-CBC-enc 534 MiB took 1.006 seconds, 530.794 MiB/s
AES-256-CBC-dec 519 MiB took 1.005 seconds, 515.954 MiB/s
AES-128-GCM-enc 584 MiB took 1.006 seconds, 579.906 MiB/s
AES-128-GCM-dec 589 MiB took 1.002 seconds, 587.165 MiB/s
AES-192-GCM-enc 539 MiB took 1.003 seconds, 537.191 MiB/s
AES-192-GCM-dec 549 MiB took 1.009 seconds, 543.848 MiB/s
AES-256-GCM-enc 504 MiB took 1.009 seconds, 499.393 MiB/s
AES-256-GCM-dec 509 MiB took 1.005 seconds, 506.107 MiB/s
GMAC Table 4-bit 1206 MiB took 1.000 seconds, 1205.651 MiB/s
CHACHA 279 MiB took 1.011 seconds, 276.295 MiB/s
CHA-POLY 185 MiB took 1.010 seconds, 182.745 MiB/s
MD5 145 MiB took 1.001 seconds, 144.458 MiB/s
POLY1305 579 MiB took 1.005 seconds, 575.578 MiB/s
SHA 110 MiB took 1.003 seconds, 109.388 MiB/s
SHA-224 574 MiB took 1.000 seconds, 573.461 MiB/s
SHA-256 574 MiB took 1.000 seconds, 573.484 MiB/s
SHA-384 130 MiB took 1.038 seconds, 124.915 MiB/s
SHA-512 130 MiB took 1.038 seconds, 124.938 MiB/s
SHA-512/224 130 MiB took 1.038 seconds, 124.920 MiB/s
SHA-512/256 130 MiB took 1.038 seconds, 124.935 MiB/s
SHA3-224 65 MiB took 1.012 seconds, 64.076 MiB/s
SHA3-256 65 MiB took 1.065 seconds, 60.882 MiB/s
SHA3-384 50 MiB took 1.045 seconds, 47.718 MiB/s
SHA3-512 35 MiB took 1.024 seconds, 34.098 MiB/s
HMAC-MD5 145 MiB took 1.002 seconds, 144.396 MiB/s
HMAC-SHA 110 MiB took 1.005 seconds, 109.202 MiB/s
HMAC-SHA224 574 MiB took 1.001 seconds, 572.972 MiB/s
HMAC-SHA256 574 MiB took 1.001 seconds, 573.067 MiB/s
HMAC-SHA384 130 MiB took 1.038 seconds, 124.887 MiB/s
HMAC-SHA512 130 MiB took 1.038 seconds, 124.892 MiB/s
PBKDF2 30 KiB took 1.000 seconds, 29.678 KiB/s
RSA 1024 key gen 7 ops took 1.071 sec, avg 153.029 ms, 6.535 ops/sec
RSA 2048 key gen 3 ops took 1.165 sec, avg 388.474 ms, 2.574 ops/sec
RSA 2048 public 4300 ops took 1.015 sec, avg 0.236 ms, 4235.023 ops/sec
RSA 2048 private 200 ops took 1.745 sec, avg 8.727 ms, 114.585 ops/sec
RSA 3072 key gen 1 ops took 1.935 sec, avg 1935.197 ms, 0.517 ops/sec
RSA 3072 public 1800 ops took 1.033 sec, avg 0.574 ms, 1742.431 ops/sec
RSA 3072 private 100 ops took 3.190 sec, avg 31.904 ms, 31.344 ops/sec
RSA 4096 key gen 1 ops took 3.543 sec, avg 3542.641 ms, 0.282 ops/sec
RSA 4096 public 1200 ops took 1.065 sec, avg 0.887 ms, 1126.922 ops/sec
RSA 4096 private 100 ops took 6.319 sec, avg 63.187 ms, 15.826 ops/sec
DH 2048 key gen 233 ops took 1.003 sec, avg 4.306 ms, 232.219 ops/sec
DH 2048 agree 300 ops took 1.292 sec, avg 4.305 ms, 232.277 ops/sec
ECC [ SECP256R1] 256 key gen 7200 ops took 1.001 sec, avg 0.139 ms, 7193.423 ops/sec
ECDHE [ SECP256R1] 256 agree 1900 ops took 1.000 sec, avg 0.526 ms, 1899.716 ops/sec
ECDSA [ SECP256R1] 256 sign 4800 ops took 1.016 sec, avg 0.212 ms, 4722.757 ops/sec
ECDSA [ SECP256R1] 256 verify 1800 ops took 1.009 sec, avg 0.561 ms, 1783.497 ops/sec
ECC [ SECP384R1] 384 key gen 2500 ops took 1.028 sec, avg 0.411 ms, 2431.867 ops/sec
ECDHE [ SECP384R1] 384 agree 600 ops took 1.015 sec, avg 1.691 ms, 591.229 ops/sec
ECDSA [ SECP384R1] 384 sign 1500 ops took 1.031 sec, avg 0.687 ms, 1455.197 ops/sec
ECDSA [ SECP384R1] 384 verify 600 ops took 1.083 sec, avg 1.804 ms, 554.264 ops/sec
ECC [ SECP521R1] 528 key gen 1200 ops took 1.070 sec, avg 0.892 ms, 1121.427 ops/sec
ECDHE [ SECP521R1] 528 agree 300 ops took 1.070 sec, avg 3.566 ms, 280.413 ops/sec
ECDSA [ SECP521R1] 528 sign 600 ops took 1.012 sec, avg 1.686 ms, 593.111 ops/sec
ECDSA [ SECP521R1] 528 verify 300 ops took 1.145 sec, avg 3.818 ms, 261.902 ops/sec
CURVE 25519 key gen 2653 ops took 1.000 sec, avg 0.377 ms, 2652.026 ops/sec
CURVE 25519 agree 2700 ops took 1.022 sec, avg 0.379 ms, 2641.905 ops/sec
ED 25519 key gen 7677 ops took 1.000 sec, avg 0.130 ms, 7676.186 ops/sec
ED 25519 sign 7200 ops took 1.001 sec, avg 0.139 ms, 7195.132 ops/sec
ED 25519 verify 2100 ops took 1.017 sec, avg 0.484 ms, 2065.816 ops/sec
Benchmark complete
Description
Fix to not allow Shake128/256 with Xilinx AFALG. Cleanup the Shake disable logic to allow forcing off with
WOLFSSL_NO_SHAKE128
andWOLFSSL_NO_SHAKE256
.Testing
Xilinx UltraScale+ MPSoC (ZCU102) Cortex A53 @ 1.2 GHz
Note 1: Using maximum benchmark block size of 16448 bytes because of issues with AFALG memory on PetaLinux. Note 2: The bare-metal and FreeRTOS performance of XilSecure is better than with PetaLinux due to driver overhead.
wolf Assembly Speedups for ARMv8
XilSecure (Crypto Hardware)
https://docs.amd.com/v/u/en-US/wp512-accel-crypto
Checklist