Cost of brute-forcing a set of passwords for multiple domains (or key-files)

cipriancraciun commented 2 days ago

Currently, based on the README, if an attacker wants to brute-force a user's password, he has to compute Argon2id for each password try (given by the user as input). (Which is good.)

However, if the attacker has multiple domains in sight, then obtaining actual passwords (outputs from the tool) is almost instantaneous, because from the masterKey onward there are only simple hash functions.

On the same subject, if the attacker has to try out multiple key-files (say he doesn't know which file on the file-system is the actual key-file used), he can do so without incurring an extra Argon2id (or alternative) operations.

(I.e. instead of paying one Argon2id cost for each "input" password try multiplied by each domain, the attacker pays only for the initial Argon2id, then each additional domain is free.)

Another take on this subject is the following type of attack:

say the attacker obtains a dump of weakly-hashed passwords from a target site; (something that can be easily checked by brute-force given enough inputs;)
say that the owner of the site has alerted its users of the potential breach, and the user has created and updated a new password (i.e. incremented the counter);
the attacker can try to guess the correct "input" password, by doing a single Argon2id, and then generating up to say 100 "output" passwords (or even more) and check if anyone of those matches the weakly-hashed password from the dump;
if the match is found, the attacker now knows with what counter offset to start poking at the target site;

(I.e. instead of paying for one Argon2id cost for each "input" password try multiplied with each counter try, the attacker pays only for the initial Argon2id, and then each additional count is free.)

A proposed simple solution is to:

either hash as much as possible into the Argon2id salt; (including domain, keyfile, counter, and other parameters the user might have used;)
either have multiple phases of Argon2id (or alternative);

samuel-lucas6 commented 1 day ago

On the same subject, if the attacker has to try out multiple key-files (say he doesn't know which file on the file-system is the actual key-file used), he can do so without incurring an extra Argon2id (or alternative) operations.

This isn't a problem assuming high-entropy keyfiles are used because they're computationally infeasible to brute force. That decision was by design since you don't need password hashing for high-entropy secrets (e.g., a 256-bit key stored in a file).

The keyfile could've been passed to Argon2, but the way the YubiKey pepper derivation works (the challenge being dependent on all inputs, including the master password) prevents that being passed to Argon2, and it made sense to have the pepper passed the same way regardless of which approach you use. The Argon2 secret key parameter also isn't available in all implementations.

(I.e. instead of paying one Argon2id cost for each "input" password try multiplied by each domain, the attacker pays only for the initial Argon2id, then each additional domain is free.)

However, this is a very interesting point. It's taking me a while to wrap my head around it this late.

If you're guessing the derived password for one domain, it speeds up attacks if you don't know the counter, length, and character set (because you can quickly compute outputs for those different options).
If you've compromised the master password from one domain, it speeds up attacks against other domains (because of the above plus the same for different domains).
If multiple sites the user uses have been breached, the attacker can search for multiple derived passwords more efficiently (basically the above).

How problematic is this?

If you assume the counter, length, and character set are known/predictable, there's little to no advantage attacking one domain.
If you assume the list of domains is known/predictable, there's little to no advantage attacking multiple domains.
If the master password is compromised and you have no pepper, it's basically game over anyway because that's the only value with real entropy (the only proper secret).

With that said, this does seem unnecessarily bad. The attacker should be slowed as much as possible. It's also got me wondering about whether the peppering has been done in the best way.

I think this is the result of looking at Spectre, focusing on attacks against one domain, and feeling that variable-length salts/salts containing info style parameters are wrong (because I've been thinking about the salt like a random salt in traditional password hashing).

either hash as much as possible into the Argon2id salt; (including domain, keyfile, counter, and other parameters the user might have used;) either have multiple phases of Argon2id (or alternative);

Multiple Argon2 calls is a bad idea because of the amount of delay/splitting the delay. Putting more into the salt and possibly using the Argon2 secret key parameter for the pepper would be an improvement. Doing the latter would prevent the YubiKey challenge depending on the password though, which seems bad because then you get duplicate challenges.

Anyhow, missing this is rather depressing and embarrassing. I wasn't planning on having a v2 of this program, but it looks like I'm going to need to with this and the Unicode password problem...

cipriancraciun commented 1 day ago

This isn't a problem assuming high-entropy keyfiles are used because they're computationally infeasible to brute force. That decision was by design since you don't need password hashing for high-entropy secrets (e.g., a 256-bit key stored in a file).

Here I was speaking of another type of brute-force:

the attacker knows that the user uses a keyfile;
the attacker has a copy of all the user's files; (laptop, cloud, etc.;)
the attacker can't "visually" identify (for example by name, extension, folder, etc.) which might be the actual key file used by the user;
thus the attacker will iterate through each of the user's files, testing if that would be perhaps the keyfile;

The reasoning is that, if the user uses a keyfile, most likely that file must reside on the file-system (or cloud storage, etc.)

Thus, because there is no Argon2 (or similar) cost for checking each file, the attacker can pre-hash all the user's files once, and instantaneously test every potential key file against a given password (by generating one site "output" password for each key file, and checking that against a weak hash dump).

Or, put another way, in brute-forcing the user's master-password, (say checking it against a weak-hash dump of a site), having a very large list of potential key-files to check adds almost no effort for the attacker as if the user hadn't used a key file at all.

The keyfile could've been passed to Argon2, but the way the YubiKey pepper derivation works [...]

I think that the security design (and its goals) should decide the "flow" of secrets, and in this case (of the keyfile), I think that pragmatic programming decisions impact the overall security.

Another observation on key files attack opportunities: currently you are hashing the key file without any other parameters (like password, identity, domain, etc.), just with the customization string cahir.keyfile, then this hash is used for further derivations as inputs.

An attacker, that has persistent access to a user's laptop or cloud, doesn't need to exfiltrate all his files to check them in a future brute-force attack (as described in the previous paragraph), he can just compute and exfiltrate the hashes. (Which being small, would most likely go undetected by network monitoring or other simple monitoring solutions.)

Instead, if each time a key file would be used, its hashing would have contained some parameters (like the master key), then the attacker would need to exfiltrate the files, and store them until a brute-force is attempted. (Thus putting storage costs on his attacks.)

I've written a bit more on the topic here:

https://notes.volution.ro/v1/2024/03/remarks/810d7404/

[...] If the master password is compromised and you have no pepper, it's basically game over anyway because that's the only value with real entropy (the only proper secret).

With that said, this does seem unnecessarily bad. The attacker should be slowed as much as possible. It's also got me wondering about whether the peppering has been done in the best way.

As noted previously, I too have an encryption / password generation tool, on which I've been working for the last two years, and I've been thinking a lot on this problem.

I could point you to my derivation code (written in Rust, but pretty readable):

https://github.com/volution/z-tokens/blob/f24049be2fcab7acd44b44d0bbdd1174931b0e2b/sources/exchange/crypto_backend.rs#L40-L105 -- this is the code where I derive the raw bytes (32 in my case) for the "output" password, based on various inputs; (the main code is shared with a file encryption tool, which was also reused for password generation;)
https://github.com/volution/z-tokens/blob/f24049be2fcab7acd44b44d0bbdd1174931b0e2b/sources/exchange/crypto_keys.rs#L45-L672 -- this is the code that actually "mixes" the various inputs;

If you want to get in touch and discuss more about this subject, see at the beginning of the https://github.com/volution/z-tokens readme where I point a Discord channel or my email address. (I think it's easier than via GitHub issues.)

Putting more into the salt and possibly using the Argon2 secret key parameter for the pepper would be an improvement. Doing the latter would prevent the YubiKey challenge depending on the password though, which seems bad because then you get duplicate challenges.

If you look at my design, where I've abstracted your YubiKey device as an "oracle" (in fact I'm using ssh-agent as a YubiKey replacement), you'll see that the same "secret" is passed through multiple phases for different purposes.

Simplifying my design (and adapting it for your case) for the purpose of exemplification:

the password (and all the other parameters like identity, domain, count, key-file, presence of YubiKey and it's slot, etc.) are "hashed" (as in Blake not Argon) once together into what I call the partial_key in my derive_keys_phase_1; let's call this the "salt";
(if YubiKey is used) the "salt" (or a hash based on that) is then used to feed the challenge for the YubiKey; (thus, if the attacker wants to brute-force anything, each time has to pass through the YubiKey, thus adding a physical rate-limiting;) let's call this "response";
(if YubiKey is used) the "response" is hashed again with the "salt" to obtain the new "salt"; (so that a malicious YubiKey can't weaken the salt by just returning the same "response" each time;) (you kind of get this in your design, because you concatenate the "pepper" to the "masterKey";)
then the "input password" is passed through Argon2 with the "salt" (either original one, or the new one after YubiKey;)

As seen above, special care should be taken to protect against weak or compromised YubiKeys.

Anyhow, missing this is rather depressing and embarrassing. I wasn't planning on having a v2 of this program, but it looks like I'm going to need to with this and the Unicode password problem...

It's not a big problem. There are many subtle details when working on such projects. As said, I've been working on my own tool for the past two years, and I still don't consider this part (the encryption and password-generation based on master-passwords) production ready.

One final note: I haven't looked at how exactly you are normalizing your inputs to the Blake function, but from the specification you seem to just concatenate inputs, based on the idea that either they are fixed in length, or the last one is variable.

I don't quite like this, because it's easy to slip a parameter that breaks the canonicalization aspect.

My approach was to always just hash individual inputs (with Blake and a custom specialization string, i.e. "context"), and then use those in the rest of the code. It makes everything simpler and safer.

(And the cost of many small Blake or other hash computations, pales in comparison to the Argon2 cost.)

samuel-lucas6 / Cahir

Cost of brute-forcing a set of passwords for multiple domains (or key-files) #1