samuel-lucas6 / draft-lucas-balloon-hashing

An Internet-Draft for the Balloon password hashing and password-based key derivation function.
Other
3 stars 1 forks source link

Provide generic parameters/safe minimums #2

Open samuel-lucas6 opened 6 months ago

samuel-lucas6 commented 6 months ago

OWASP, TobTu, the Argon2 RFC, etc provide recommended parameters. It would be good to do the same assuming they are realistic, but there's no guidance on parameters from the authors of Balloon. Furthermore, Balloon hasn't been benchmarked properly with different hash functions and parameters to determine suitable recommendations.

Sc00bz commented 2 weeks ago

The best way to do this is count the amount of hash block calculations and compare to PBKDF2 because a GPU attacker will be limited by compute instead of bandwidth. Balloon needs to use >1 MiB to start slowing down GPUs. >3 MiB will have a near linear slowdown.

I'm going to use "H/block" to mean "hash block calculations per block of memory". This is how many hash blocks are calculated to output a hash. Depending on implementation and settings (I'll use "delta" or "neighbors" = 3). This is usually 3-17 H/block. Some bad implementations depend on salt length and worse depend on secret data and its length. Defender costs are paper (14 H/block), academic code (3 H/block + 0.5 AES/block), Rust's (14-17+ H/block (depends on salt and secret data lengths)), Python (14-17+ H/block (depends on salt length)). Attacker costs are paper (8 H/block), academic code (3 H/block), Rust's (8 H/block), Python (8 H/block).

Rough settings for <10 KH/s/GPU (GPU being an RTX 4080 Super or "2/3 of an RTX 4090" which should be within a few percent of each other): Balloon-SHA-256: Academic code: m=256 KiB, t=48; m=512 KiB, t=24; m=1 MiB, t=12; m=2 MiB, t=3-6? Paper/Rust's/Python: m=256 KiB, t=18; m=512 KiB, t=9; m=1 MiB, t=5; m=2 MiB, t=2-3?

Balloon-SHA-512: Academic code: m=256 KiB, t=34; m=512 KiB, t=17; m=1 MiB, t=9; m=2 MiB, t=2-4? Paper/Rust's/Python: m=256 KiB, t=13; m=512 KiB, t=7; m=1 MiB, t=4; m=2 MiB, t=1-2?

I looked into what the best output size of SHAKE128 for max speed: 248, 520, 792, 1056, and 1328 are 1, 2, 3, 4, and 5 H/block. Best is SHAKE128 with 1328 bits of output but it's not a multiple of a cache line so 512 (2 H/block) or 1024 (4 H/block) would be better. But this puts it at about even with SHA-512 for the attacker. Now the only question is which is faster for the defender. SHAKE128-1024 might be faster even if it's slower than SHA-512 because of the larger block size. Also if the same hash function is used for the deterministic random lookups then SHAKE128-1024 will have an advantage when grabbing 64 bit at a time vs big int math. Note the "academic code" is SHA-256 for the hash and AES-128 for the deterministic random lookups. I just remembered that SHAKE128 has an internal parallelism of 5. Which means the slowdown for an attacker doesn't start until >5 MiB but to utilize that there's a performance hit. Yeah... I'll need to implement it to see where it lands. Not that I'm going to do that any time soon.