Open nikhiljha opened 4 years ago
I don't think I have a problem with self hosting the hibp data, my only concern there is does it take a long time to search?
No, it's all stored in a hash-tree-like thing. Search times should be ~a few ms.
Then that seems optimal, I definitely don't want to rely on hibp being up for account creation.
We could also do something like try and query the API but have a fallback to just allow the password if it's down.
Ideally the password would get checked every time it's used, not just for account creation. It's possible that it got compromised between account creation and usage.
At the very least we need to make sure that everyone's current password gets checked at least once during login, since there are a lot of existing accounts with potentially questionable passwords.
Also, the full database is like 10GB compressed/slightly larger uncompressed. Hosting it and throwing data at it is probably (TM) not that big a deal.
Ok it should be just...
auth required pam_pwnd.so try_first_pass
to the relevant pam config....but I don't have any debian systems atm so I can't test it.
We need to make sure that when it fails, we alert the user to change their password instead of just failing to log in.
Also, debian uses pam_auth_update, so we shouldn't edit pam config directly. See https://github.com/ocf/puppet/blob/master/modules/ocf/manifests/auth.pp#L81-L116 in puppet or check out /usr/share/pam-configs on an OCF host.
~We also must make sure things do not break when a user authenticates with a kerberos ticket instead of a password.~ nvm this is an ssh thing, pam is not involved
Somewhat off topic, but I wonder if we have also considered alteratives to cracklib like zxcvbn.
I'll work on this issue this weekend
Downsides to using a bloom filter include: it needs to be regenerated from the latest API data every so often, it has some false positives (1 in a million).
I think we could expand the filter to 3 GB instead of 2 GB to make the false positive rate 1-in-a-billion (1e-9 = (1e-6)^(3/2)). I assume that rate would be acceptable?
I think we could expand the filter to 3 GB instead of 2 GB to make the false positive rate 1-in-a-billion (1e-9 = (1e-6)^(3/2)). I assume that rate would be acceptable?
I should hope so!
The bloom filter is cool but why spend compute regenerating it when we can spend storage instead 😁
or better yet, fail open + hibp api k anonymity
A lot of the "security" rt tickets have to do with compromised passwords. These are almost definitely from a credential stuffing attack, which can actually be somewhat mitigated by disallowing known-compromised passwords.
There's a PAM module here, but it makes an HTTP request to the haveibeenpwned API.
If we'd rather not make an API call every time someone enters their password, there are bloom filters ~2GB that have pretty good false positive rates.
Downsides to using a bloom filter include: it needs to be regenerated from the latest API data every so often, it has some false positives (1 in a million).
Alternatively alternatively, we can just host the entire hibp API for internal use (and also recommend that it gets applied to all the WordPress sites running on OCF infra). Someone made a tool to do this here: https://github.com/ralscha/selfhost-hibp-passwords.