Implement performance benchmarks

ronaldtse commented 7 years ago

Description

Right now our benchmarks with GnuPG are rather crude.

We should have something like OpenSSL's openssl speed -evp <algo>-<mode> so we can check the speed of the popular combinations (block cipher algo/mode, public key algo, hash algo), such as 'aes-cfb-rsa-sha256' or 'sm4-ctr-sm2-sm3'. Later on we can test AEAD GCM / OCB too.

Sample output for openssl speed:

openssl speed -evp sm4-ctr
Doing sm4-ctr for 3s on 16 size blocks: 18459807 sm4-ctr's in 2.96s
Doing sm4-ctr for 3s on 64 size blocks: 4862157 sm4-ctr's in 2.93s
Doing sm4-ctr for 3s on 256 size blocks: 1280728 sm4-ctr's in 2.97s
Doing sm4-ctr for 3s on 1024 size blocks: 320266 sm4-ctr's in 2.98s
Doing sm4-ctr for 3s on 8192 size blocks: 40143 sm4-ctr's in 2.97s
Doing sm4-ctr for 3s on 16384 size blocks: 19771 sm4-ctr's in 2.97s
OpenSSL 1.1.1-dev  xx XXX xxxx
built on: reproducible build, date unspecified
options:bn(64,64) rc4(16x,int) des(int) aes(partial) idea(int) blowfish(ptr) 
compiler: /usr/bin/gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\""  -Qunused-arguments
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
sm4-ctr          99782.74k   106204.11k   110392.72k   110051.14k   110724.40k   109066.69k

On the other hand, we should also have some benchmarks that traces CPU and memory usage during the run (e.g., output as CSV) so we can easily see the obvious problems.

We could produce a static trace per command execution, something like:

(pid),(thread-id),(vmem),(realmem),(readspeed)(writespeed),(cpu%),...

Such as using pidstat (on Linux). Or much more serious like http://www.brendangregg.com/perf.html.

Thoughts?

cc: @ni4 @dewyatt @flowher

kriskwiatkowski commented 7 years ago

Having simply a time would be already useful I think. Nevertheless, we could have both approaches implemented in a single tool (let's say rnp speed). RNP could use perf's API which can count cpu cycles (example of usage is here: https://github.com/IAIK/armageddon/blob/master/libflush/libflush/timing.c)

Valgrind also have interesting API which could be useful for checking heap usage (I'm thinking about plugin called massif - http://valgrind.org/docs/manual/ms-manual.html).

Probably there are other tools/libraries which we could reuse

ronaldtse commented 7 years ago

Agree on using Perf API and massif — but I wonder if we should apply the monitoring tools through the public API or directly to the internal code, or to both separately?

kriskwiatkowski commented 7 years ago

I think performance for of our public API is what's most interesting for us and users of the API, in particular API of repgp and/or rnp(2) as that's when processing will happen.

I think, I would like to see something like Celero calling our API and performing benchmarking, as a first step

ronaldtse commented 7 years ago

Agreed!

rnpgp / rnp

Implement performance benchmarks #510

Description