Closed SparkiDev closed 1 week ago
Am I testing the performance improvement correctly? It could just be noise on the machine but a lot of the performance numbers looked slower after the change when I ran it this way.
Master wolfSSL branch on a Mac M1
wolfssl % ./configure --enable-armasm --enable-kyber --enable-experimental -q
ld: warning: -single_module is obsolete
wolfssl % make &> /dev/null
wolfssl % ./wolfcrypt/benchmark/benchmark -kyber
------------------------------------------------------------------------------
wolfSSL version 5.7.0
------------------------------------------------------------------------------
Math: Multi-Precision: Wolf(SP) no-dyn-stack word-size=64 bits=4096 sp_int.c
wolfCrypt Benchmark (block bytes 1048576, min 1.0 sec each)
KYBER512 128 key gen 110800 ops took 1.001 sec, avg 0.009 ms, 110743.101 ops/sec
KYBER512 128 encap 84400 ops took 1.000 sec, avg 0.012 ms, 84374.010 ops/sec
KYBER512 128 decap 66900 ops took 1.001 sec, avg 0.015 ms, 66820.026 ops/sec
KYBER768 192 key gen 65300 ops took 1.000 sec, avg 0.015 ms, 65298.614 ops/sec
KYBER768 192 encap 51900 ops took 1.002 sec, avg 0.019 ms, 51807.583 ops/sec
KYBER768 192 decap 41900 ops took 1.001 sec, avg 0.024 ms, 41850.111 ops/sec
KYBER1024 256 key gen 41300 ops took 1.002 sec, avg 0.024 ms, 41206.746 ops/sec
KYBER1024 256 encap 33700 ops took 1.002 sec, avg 0.030 ms, 33628.507 ops/sec
KYBER1024 256 decap 28600 ops took 1.002 sec, avg 0.035 ms, 28530.556 ops/sec
Benchmark complete
Pulling in these changes:
wolfssl % git checkout -b SparkiDev-kyber_improv_1 master
git pull https://github.com/SparkiDev/wolfssl.git kyber_improv_1
From https://github.com/SparkiDev/wolfssl
* branch kyber_improv_1 -> FETCH_HEAD
Successfully rebased and updated refs/heads/SparkiDev-kyber_improv_1.
wolfssl % make &> /dev/null
wolfssl % ./wolfcrypt/benchmark/benchmark -kyber
------------------------------------------------------------------------------
wolfSSL version 5.7.0
------------------------------------------------------------------------------
Math: Multi-Precision: Wolf(SP) no-dyn-stack word-size=64 bits=4096 sp_int.c
wolfCrypt Benchmark (block bytes 1048576, min 1.0 sec each)
KYBER512 128 key gen 108700 ops took 1.000 sec, avg 0.009 ms, 108652.517 ops/sec
KYBER512 128 encap 85300 ops took 1.000 sec, avg 0.012 ms, 85286.112 ops/sec
KYBER512 128 decap 66500 ops took 1.000 sec, avg 0.015 ms, 66479.664 ops/sec
KYBER768 192 key gen 61900 ops took 1.001 sec, avg 0.016 ms, 61833.895 ops/sec
KYBER768 192 encap 50900 ops took 1.001 sec, avg 0.020 ms, 50844.831 ops/sec
KYBER768 192 decap 42400 ops took 1.000 sec, avg 0.024 ms, 42382.751 ops/sec
KYBER1024 256 key gen 40300 ops took 1.001 sec, avg 0.025 ms, 40242.538 ops/sec
KYBER1024 256 encap 33600 ops took 1.002 sec, avg 0.030 ms, 33530.960 ops/sec
KYBER1024 256 decap 28200 ops took 1.000 sec, avg 0.035 ms, 28189.287 ops/sec
Benchmark complete
Hi Jacob,
The benchmark testing for this algorithm is very jittery. My testing was on Intel x64. The ARM should not be impacted too much.
Sean
Description
Unroll loops and use larger types. Allow benchmark to run each kyber parameter separately. Allow benchmark to have -ml-dsa specified which runs all parameters. Fix thumb2 ASM C code to not have duplicate includes and ifdef checks. Fix thumb2 ASM C code to include error-crypt.h to ensure no empty translation unit. Check for WOLFSSL_SHA3 before including Thumb2 SHA-3 assembly code.
Testing
Tested cross-compile with -pedantic for hosts armv7m and armv8 with --enable-armasm and --enable-armasm=inline.
Checklist