Performance benchmark suite for rnp

ronaldtse commented 7 years ago

We should have a performance benchmark suite that compares performance of rnp to GnuPG / GPGME to prevent regressions.

From @dewyatt :

We could also launch gpg externally as well and do tests to ensure we're generating compatible keys, things like that. Assuming it has sane exit codes, which it probably does. (Or test with gpgme, but that would probably be more effort)

ronaldtse commented 7 years ago

Thanks @ni4 !

ronaldtse commented 7 years ago

@frank-trampe is going to set this up. Apologies for the late notice!

ni4 commented 7 years ago

@frank-trampe: We should somehow elaborate on this so task can be done together. What about dividing it to the three parts : 1) list of performance items (key generation, encryption, signing, verification, something else?) 2) script which calculates performance data for rnp 3) script which calculates performance data for GnuPG ?

ni4 commented 7 years ago

@frank-trampe Oops, how that comes that I unassigned you just by mentioning.

ronaldtse commented 7 years ago

@frank-trampe is back here 👍

frank-trampe commented 7 years ago

See branch rnpgpgtest_1. In order to facilitate testing, I left out the line that removes the temporary directory. It logs stderr to the temporary directory, too. There were some odd things that looked like errors that didn't affect the result that we might want to look at.

ni4 commented 7 years ago

I pushed some python stuff to the ni4-149-performance-benchmark-for-rnp branch. It is not finished - to some problems with non-compatible key generation/strange output, so will debug that first.

frank-trampe commented 7 years ago

I modified my branch to work around various deficiencies in rnp, such as ignoring the --output option. Unfortunately, rnp --decrypt does not seem to produce an output file.

ronaldtse commented 7 years ago

@frank-trampe , @ni4 the performance benchmark scripts do look good but are in diverged approaches. Can we consolidate both into the same structure so we can have one authoritative benchmark moving forward? Thank you guys.

frank-trampe commented 7 years ago

@ronaldtse, my code is unnecessarily verbose due to rnp ignoring the --output option almost universally. Any chance that we could fix that before a final draft? We also need to decide what parameters from time we want and how to format them.

ronaldtse commented 7 years ago

@frank-trampe yes could you file an issue about the --output option?

We probably want to try a couple times per command to ensure the output time is averaged out. For output time we need enough granularity to distinguish performance differences (e.g., 0.01s vs 0.02s is not great to compare with, but 100ms and 200ms for 10 times is).

frank-trampe commented 7 years ago

@ronaldtse, those all run 64 times.

frank-trampe commented 7 years ago

Blocking issues are #211 and #210. I've worked around #203 by pregenerating keys in gnupg.

ni4 commented 7 years ago

@ronaldtse My scripts have configurable number of runs for small and large files for each operation, and run time averaged. Then it is divided to the GnuPG's value and displayed as well. Basically since we do not have mythical reference system the only value we can rely on is that division result, i.e. 'we are 1.5x times slow then GnuPG on that operation'

ni4 commented 7 years ago

@frank-trampe @ronaldtse Regarding having the single scripts version. What about to have .sh script as it is now to integrate into the automatic testing (so, say, it tries all the main operations, compares to GnuPG output, and fails if it is out of bounds, i.e. encryption run 1.5 times slower while we expect it in 1.2-1.4 bounds), and have more sophisticated python script which will check more operations (algorithm/key combinations, armoring, compression, whatever else), and later on use python bindings to call library code directly and calculate all the low-level operations speed? Imho having .sh script for all this stuff would be a bit painful.

frank-trampe commented 7 years ago

@ni4, that's probably a reasonable approach. I committed to shell script originally for a few reasons.

I wasn't sure what the baseline development environment might include but knew that shell script would be present. Python is okay because Ubuntu and CentOS bundle it by default, I think.
Somebody suggested using BATS, which would imply using bash. I didn't use BATS, though, just TAP output syntax, which is easily ported.
Calling executables (as opposed to libraries) and invoking developer tools like time is a lot easier in shell script than in Python. Keeping the most difficult fragments as embedded shell script within the Python script solves that.

ronaldtse commented 7 years ago

@ni4 @frank-trampe I think we are all in agreement here for clearer results for benchmarking. Would you be able to integrate the tests in a PR?

ni4 commented 7 years ago

@ronaldtse I'll get back to this task once finished with current issue, will this work for you?

ronaldtse commented 7 years ago

@ni4 of course, no problem at all.

rnpgp / rnp

Performance benchmark suite for rnp #149