Closed ghost closed 2 years ago
I suppose Botan is less optimized than libargon2. @droidmonkey Should we do some benchmarks?
From 2 to 9 seconds is extreme and interesting. I definitely notice a slower speed with botan, but not >4x slower.
https://github.com/microsoft/vcpkg/tree/master/ports%2Fbotan
https://github.com/msys2/MINGW-packages/tree/master/mingw-w64-libbotan
https://github.com/keepassxreboot/keepassxc/pull/6209#issuecomment-792051844
@randombit I am building Botan using vcpkg, might we be missing an optimization flag in there?
Are you sure you are not using their dbg
mode? That sets --debug-mode
to the library configure script, which disables optimizations completely.
Also I have no idea of the performance of Argon2 on VC. In general I use GCC/Clang on Linux, and IME the VC optimizer is significantly less good.
The iteration count for Argon2 seems really high given that RFC 9106 recommends no more than t=3
. I doubt I ever checked performance with much higher iterations. I'll look into this. I remember a long time ago checking the libargon2
library vs Botan and they were about the same but I'm sure that was with a small t
.
When I tried it with Iterations 3 Memory 2047MB Parallelism 8, it took about 1-2 seconds with KeePassXC 2.6.6, 1.3 seconds with KeePass 2.50, and about 6-7 seconds with KeePassXC 2.7.0-beta1. When I tried it again with Iterations 128 Memory 64MB Parallelism 8, it took about 8 seconds, so I don't think the impact of Iterations is significant.
(Is it better to reduce the number of iterations? The default setting of the KeePass 2.50 is Memory 64MB, so I used it.)
It should definitely not be slower than KeePass2. In fact, it should be much faster than that.
You reduced iterations and significantly increased memory. Do one or the other. I certainly don't see the level of degradation that you do under a 1 second benchmark. My database decrypts in roughly the same time between my mobile (KeePassDX) and laptop (KPXC 2.7.0). My laptop should be much faster than mobile, but it's not.
I'll have to check on the dbg mode.
Comparing libargon2
(https://github.com/P-H-C/phc-winner-argon2 as packaged on Arch Linux) with Botan. This is a i7-6700K at 4.2 GHz. Both using Argon2id, 64 Mb, t=128, p=8
Botan: 4.8 s (via botan speed argon2
)
libargon2: 10.4 s (using time reported by argon2
cli util)
[Why this discrepancy is so large in favor of Botan I have no idea]
... oh now I see why ...
libargon2
does use threads for p
> 1 (Botan does not) but the time it reports is - I guess - the sum total across all of the threads, rather than the wall clock time which is what anyone normal cares about. So for above example the cli util reports 10.4 s but actually only takes 2.7 seconds wall time.
OK so for p
> 1 you are going to see a fairly large perf hit. I can look into using threads (there is even a todo in the source about this) but that won't help when building against existing releases of the library.
Oof that is a big deal if we aren't doing parallelism in threads. This may make me want to go back to the reference implementation.
I am in favour of that until this is fixed. I can confirm that 2.6.6 with p > 1 decrypts much faster. When I benchmark to 128MiB at 16 threads, I get 6 rounds in 2.7.0-beta1 and 44 rounds in 2.6.6.
But that's not all. If I set p = 1, I still get 6 rounds in 2.7.0-beta1 (not surprisingly), but 10 rounds in 2.6.6. So while the missing support for threads is the biggest issue, the whole thing seems less efficient than the reference implementation.
Luckily it's not a big deal to go back to libargon2. Will try to get that in a PR tonight for review.
@randombit why not just embed and wrap the reference implementation by Botan API? Their license is CC0 / Apache.
@droidmonkey The way their code is written, integrating it/adapting it would be harder than just adding threading and SIMD support directly. PR adding threads is up now, SSSE3 version is written but needs some debugging still. I'll backport these in the next 2.x release. Probably too late to get into upcoming Ubuntu 22.04 though. Wish this issue had been raised sooner. :/
Great, thanks for resolving this quickly. @droidmonkey Could we perhaps compile conditionally, depending on the installed Botan version?
Just as an FYI, I wrote a quick tool to properly benchmark the performance of libargon2
(https://gist.github.com/randombit/adaaaecfc23906af0fe8f3e7ea0257cd) since their cli util is clearly not so helpful for this. For params M=65536, t=128, p=8, on an 4 core CPU with AVX2 and HT, I am seeing 1386 ms/per hash with Botan+threads, and 1375 ms/per hash with libargon2
, basically the same. I think libargon2 has the edge in that it has SIMD support, but Botan is making up for most of that by using a thread pool instead of creating+killing many threads.
@phoerious I'm curious about the results you are seeing with p
= 1. Windows or Linux? Does your CPU have AVX-512?
@phoerious I'm curious about the results you are seeing with p = 1. Windows or Linux? Does your CPU have AVX-512?
Windows, Ryzen 3950X, so I guess not. Also tested it on the Threadripper 2920X machine I'm working on right now (Ubuntu Linux, AVX2 only) and I get roughly the same performance there (8 vs. 7, sometimes 8).
Here's botan speed
(2.17.3) vs your libargon2 benchmark tool with t=1, m=65536, p=1) on the Threadripper:
Argon2id M=65536 t=1 p=1 15 /sec; 65.44 ms/op 229022108 cycles/op (46 ops in 3010 ms)
42 runs in 3069.218427 ms -> 73.076629 ms/run
I can test the Ryzen in the evening.
@randombit Here's the results on my Ryzen (Windows, compiled via Msys2):
Argon2id M=65536 t=1 p=1 16 /sec; 60.08 ms/op 600744 cycles/op (50 ops in 3004 ms)
59 runs in 3045.006700 ms -> 51.610283 ms/run
Overview
Unlocking a database in KeePassXC 2.7.0-beta1 is much slower than unlocking the same database in KeePassXC 2.6.6 or KeePass 2.50.
Steps to Reproduce
Expected Behavior
Unlocking the database takes about 2 seconds.
Actual Behavior
Unlocking the database takes about 9 seconds.
Context
With KeePassXC 2.6.6 or KeePass 2.50, unlocking takes about 2 seconds. I created a database with KeePassXC 2.6.6, KeePassXC 2.7.0-beta1 and KeePass 2.50, but it happened with any database. I suspected antivirus (Kaspersky) interference and paused protection before unlocking it, but it didn't improve. It was also reproduced in Windows 11 Insider Preview Dev on VMware Player.
CPU: Core i7-7700K RAM: DDR4-2400 16GB
KeePassXC - 2.7.0-beta1 Revision: 046e508
Operating System: Windows 10 21H2