samuel-lucas6 / draft-lucas-bkdf

An Internet-Draft for the Balloon Key Derivation Function (BKDF), a memory-hard password hashing and password-based key derivation function.
Other
4 stars 1 forks source link

Decide whether to keep parallelism #17

Closed samuel-lucas6 closed 1 month ago

samuel-lucas6 commented 2 months ago

Argon2 and scrypt have a parallelism parameter, but it's often not used/recommended to stay at 1 and sometimes implemented serially. More parameters also means more confusion, like setting the cost parameter for bcrypt is much simpler than setting the parameters for memory-hard algorithms.

On the other hand, removing this parameter could affect interoperability, with people implementing parallelism themselves. It also doesn't add much code complexity.

Sc00bz commented 2 months ago

It should be kept.

often not used

This Stack Exchange answer is wrong. Also parallelism on Argon2 is different than how Balloon's parallelism has been implemented. Argon2 splits memory into lanes and each lane can be run by a different thread. The threads sync four times an iteration. Balloon and scrypt run the base function multiple times in parallel and combine the output.

The parallelism was probably one reason why Argon2 won the Password Hashing Competition.

We understood that you could just call the base function multiple times in parallel and multiple submissions had parallelism built in. Maybe they mean the specific way Argon2 does parallelism.

As of that time, about 50 percent have quad-core processors in their computer. This would suggest to increase the parallelism factor to 8 (twice the number of cores), but for smartphones with a single core this would roughly quadruple the execution time. [...] If you assume that the vast majority of your users have at least two processor cores, you can set the parallelism factor to 4, if you are more careful, to 2.

This "twice the number of cores" thing is actually bad for Argon2 and originated from the RFC. You should never try to have parallelism greater than the number of cores. Doing twice makes Argon2 half as memory hard. Memory hardness of Argon2 is m/p vs "m" for Balloon and scrypt. Also the RFC has bad settings "m=2 GiB, t=1 or m=64 MiB, t=3". The first one does over 10x the work of the second. The second should be at least t=32 and really a lot more since it doesn't use as much memory. Accounting for reaching the attacker's max GPU memory bottleneck it should be like t=256 to make them closer.


recommended to stay at 1

It's a little nuanced (note it doesn't mention client side since I assumed most will do p=1, 2, 4, or cores. I guess I should fix it):

"Cores" meaning probably just performance cores but this is another can of worms. Also with scrypt you can use p as a time parameter.

* Advice on parallelism is not for Argon2. For Argon2, set p=1, 2, maybe 4, or cores (set max threads to ceiling(p/ceiling(p/cores))). Since you want to be sure that p<=cores. Otherwise you give the attacker an advantage.


sometimes implemented serially

That's only for scrypt because everyone uses the official code or rewrites that copying the lack threads. Argon2's official code can spawn threads and most uses have threads.

samuel-lucas6 commented 2 months ago

Thanks for the info. What I was getting at with the first link is that libsodium limits the parallelism to 1 for hashing. The parallelism recommendation can perhaps be reworded again, although it gets quite complicated to explain. I know only scrypt does it serially, and it would be good to keep it that way. I don't see any point promoting that for Balloon.

And I forgot to say that I'm happy to keep parallelism. I don't think it complicates the code much. Other people may disagree/think fewer parameters is best.

Sc00bz commented 2 months ago

I thought that libsodium used threads for Argon2 (just look you're right). I thought PHP used libsodium for Argon2, but it looks like they use a different implementation because PHP's Argon2 uses threads.

For fewer parameters, you could define a simplified API function for Balloon:

balloon_hash(pw, m, t, p, extraSalts, pepper)
balloon_verify(hash, pw, extraSalts, pepper)
balloon_kdf(pw, salt, m, t, p, extraSalts, pepper)

balloon_hashSimple(pw, cost)      // convert cost into m and t and calls balloon_hash(pw, m, t, 1, NULL, NULL)
balloon_verifySimple(hash, pw)    //                               calls balloon_verify(hash, pw, NULL, NULL)
balloon_kdfSimple(pw, salt, cost) // convert cost into m and t and calls balloon_kdf(pw, salt, m, t, 2, NULL, NULL)

For "convert cost into m and t", you could do something like this (increasing cost by 1 increases the costs by a factor of about square root 2):

m = 1 << (cost / 2 + 14)
t = 5 + 2 * (cost % 2) // 5 * 2^0.5 ~ 7.071

Or instead of cost have m and t. Or pick m to be 2 MiB for hashing and 8 MiB for KDF (or something) and instead of cost have t.

samuel-lucas6 commented 1 month ago

I'm going to close this because I think it's sensible to keep parallelism. If anyone disagrees, feel free to reopen this issue.