pypi / warehouse

The Python Package Index
https://pypi.org
Apache License 2.0
3.51k stars 940 forks source link

2FA bruteforce check #8456

Open ewjoachim opened 3 years ago

ewjoachim commented 3 years ago

What's the problem this feature will solve? Currently, there's a rate limiter on 2FA to avoid abuse, but an attacker patient enough can hope to correctly guess the code (>50% chance) in 3 months. There's no limit on the maximum number of failed tries before a protection measure is added. Having a 6 digits code can only be a thing if you can't bruteforce it, otherwise it's just a very weak code.

Describe the solution you'd like A sufficient (to determine) number of failed 2FA tries should raise a protective measure (to determine).

Additional context I reported the problem via the official Security mailing list back in March 2020, but (AFAICT) the security team didn't have the time to handle it in a private patch. I recently got the confirmation from @di that it was OK to post it publicly, and develop the patch publicly. I might be interested in helping and/or coding but I'm currently involved in #7124 and I'd much rather prefer to finish the big works before starting on another one. So feel free to grab the issue and improve PyPI's security :)

Here's the original report as sent to security@ in March:

This one may be harder than usual to reproduce but, here I go :) Again, not sure this qualifies as an important security problem, but you're the judge.

Looking at the TOTP code in warehouse, I noticed it uses an anti-bruteforce policy which is a moving window of (unless I'm wrong) 10 attempts per 5 minutes. Each attempt counts for 3 because the TOTP validates with 3 30-second windows: the current one, the previous one and the next one, and we can fit 10*3 independent 30-second windows in 5 minutes. I'll be using 9 attempts below rather than 10 so that the victim won't notice if they try to login too.

As it stands, if I have someone's credentials, I have more than 50% chance of success bypassing TOTP without raising any alarm by quietly having a script run 9 tries every 5 minutes for 3 months. In 1 year time, I have >95% chances of success. [1]

So, I know 3 months is not, like, 3 minutes, but we're clearly not looking at a case of "3000 years" nor "3 months with access to a supercomputer", so that might be pulled off easily (assuming user doesn't change their password in 3 months, but it's not very long in terms of how often we should do it.). Also, it's just 50% chance, flip of a coin. Also, it assumes that the attacker has working credentials in the first place (but that's the whole point of 2FA). And also, I wasn't able to try this one out, obviously, I didn't sit on it for 3 months to check if it was correct :) [Note: I haven't tried since]

The HOTP RFC (which is the basis for the TOTP RFC) has a paragraph on throttling (https://tools.ietf.org/html/rfc4226#section-7.3) which says sadly not so much, but suggests having something break after too many tries. They also mention a timeout growing linearly (so the total time would become quadratic), that would render this attack unfeasible, but it would also make it very strange for the intended user who tries to connect and is told they need to try again in 4 days :/

All in all, I think if TOTP was failed more than 10 or 20 times in a row on my account, I'd be much more comfortable knowing my account password was reset.

(Also, did you notice: 1 login going through password and TOTP decreases the rate limit by 2 for global (one for login, one for 2FA) and 2 for user (same), which means either the margin on the rate limit was high enough that dividing it by 2 changes nothing, or users get rate limited more often with 2FA on ?) [1]: Formula: if 0<x<1 is the probability you're aiming then the number of months to bypass 2FA with a probability >=x with 27 tries every 5 minutes is given by: (math.log(1-x)/(math.log(1-1/1000000) * 27/5)) /60/24/30

(If my math is wrong, I'll happily stand corrected)

ewdurbin commented 2 years ago

with #10264, we can now configure each rate limiter without having to perform a code change.

Our current user and ip based rate limits are the defaults of 10 per 5 minutes, but our rate limit library allows for configuring more complex limits (https://limits.readthedocs.io/en/stable/string-notation.html).

If you have suggestions on some composite rate limits that would be more suitable to guarding against such an attack, I'm all ears!

ewjoachim commented 2 years ago

It's all a question of probability. How long would we consider it acceptable to take for a 50% chance of succesful bruteforcing ? 1 year ? 10 years ? 100 years ? Currently it's 3 months.

Can we say something like 10 unsuccesful 2FA attempts per hour, and 40 unsuccesful 2FA attempts per months ? That would mean we can try at most 1440 tries a year, which would give an average .144% chance, so around 480 years for a 50% chance of success. I can't see a valid usecase where the user fails 40 times the 2FA challenge in a single month (but there can largely be a case where they successfully pass the 2FA challenge 40 times a month).

Also, can we say that failing the 40/month limit would trigger a password reset (with special wording) ?

(disclaimer: maths on the top of my head. Happy to stand corrected if I made a mistake)

I think the key is to count the unsuccesful attempts, not the total attemps, because most of the time, normal users won't fail too many time (but they can succeed at logging it quite often)

EDIT: I checked, it does just that already :)

ewjoachim commented 2 years ago

Re-reading the current code: we need the 40/month to only apply to 2FA failure, otherwise it's really easy to lock someone out of their account for 1 month by submitting 40 password failures.

miketheman commented 2 years ago

Reading through tis issue, I think an outstanding question is: when does the user's password get reset/locked out, prompting them to obtain a new one via email?

I agree that a slow-roll brute-force attack can happen, no matter how long a lockout, as we assume the attacker has a valid user/pass, and only needs the 2FA to succeed. If we invalidate the password, then the attack becomes invalid.

So in some respect, I don't know if the 2FA failure should be handled any differently than a regular user/pass auth failure, should it? In the 2FA scenario, the user has opted for the added layer of security, which is great, but a failed attempt is a failed attempt.

I think even 10 failed 2FA logins per hour is generous - I'm used to even shorter tolerances such as 3 failed total and you get locked out and need to reset password. See the popular django-axes middleware that sets 3 as a default, and doesn't default to a cooldown period.

Looking at this code:

https://github.com/pypa/warehouse/blob/58c1b4254f3f546e4fc5365c9b7a9f1f222fb98f/warehouse/accounts/services.py#L115-L152

It seems like there's no current account lockout at all - would that be correct? Only rate limits - shouldn't that be the thing to support first? After N failed attempts, 2FA or not, no more playing for you until you reset password.

Seems like that is something we'd implement on or around here:

https://github.com/pypa/warehouse/blob/58c1b4254f3f546e4fc5365c9b7a9f1f222fb98f/warehouse/accounts/services.py#L150-L152

pradyunsg commented 2 years ago

IIUC, the codebase is already handling failed TOTP attempts in roughly the same way as failed password attempts:

(password)

https://github.com/pypa/warehouse/blob/58c1b4254f3f546e4fc5365c9b7a9f1f222fb98f/warehouse/accounts/services.py#L166

https://github.com/pypa/warehouse/blob/58c1b4254f3f546e4fc5365c9b7a9f1f222fb98f/warehouse/accounts/services.py#L200

(TOTP)

https://github.com/pypa/warehouse/blob/58c1b4254f3f546e4fc5365c9b7a9f1f222fb98f/warehouse/accounts/services.py#L352

https://github.com/pypa/warehouse/blob/58c1b4254f3f546e4fc5365c9b7a9f1f222fb98f/warehouse/accounts/services.py#L384


I think bringing the rate limit down from 10 failed tries to 3 failed tries in a 5 minute window is reasonable. That alone changes the attack feasibility, by bringing it down from 9 attempts to 2 attempts per timeout, which results in 4.5x time-wise (i.e. >1 year) for getting to 50% probablity. We should probably just do this right now. :)

To deal with slow-roll attacks, we could add a long-ish limit with a low number, like 10/year or something. That would basically make the slow-roll attack last longer than a human life span, which is all we really need here IMO.

We already have the ability to set multiple limits like that, thanks to:

https://github.com/pypa/warehouse/blob/7fc3ce5bd7ecc93ef54c1652787fb5e7757fe6f2/warehouse/rate_limiting/__init__.py#L58

If you have suggestions on some composite rate limits that would be more suitable to guarding against such an attack, I'm all ears!

Basically, I'm suggesting that we can change the current rate limit of 10/5minutes (for TOTP) to 3/5minutes, 10/year. :)

pradyunsg commented 2 years ago

Ah, and as @ewjoachim mentioned, we'd probably want to separate the rate limit of passwords from TOTP tokens, which means we'd need to have separate rate limiters for them, which is a code change that we'd want to do here. :)

ewjoachim commented 2 years ago

3/5 min can be very low. We're talking about TOTP, a 1 minute clock difference between your device and the server can quickly lead to 3 wrong attempts. Also, people may have multiple accounts and might be mistakely reading the TOTP from the wrong account. I can take more than 3 failures to realize one's mistake (but... probably not 10) I think that as long as we have a x/year or x/month limit, adding a low x/10 minutes limit doesn't add a lot security-wise.

ewjoachim commented 2 years ago

Oh also explicitely adding: we might need the IP-based limit to help us avoid a situation where an attacker would purposefully spam wrong passwords on a user to keep them from connecting.

pradyunsg commented 2 years ago

we might need the IP-based limit

We have this already:

https://github.com/pypa/warehouse/blob/58c1b4254f3f546e4fc5365c9b7a9f1f222fb98f/warehouse/accounts/services.py#L120

miketheman commented 2 years ago

Yay, we've got the IP-based rate limiting in place already - that's great! That looks to resolve the last two comments.

I think there's an interesting balance to be struck between what @pradyunsg and @ewjoachim are saying - it makes sense that a human may make a mistake a couple of times, or that their clock skew might be a bit off - so I'll agree that there's a "happy medium" somewhere (as there always is when it comes to security...)

But I don't think we addressed the question of: Should a user account ever get locked out - not for a period of time, but actually locked out, requiring a password reset?

I think that functionality resolves a lot of the slow-roll attacks - so if someone tries to log in as me multiple times, my account gets locked out and I get an email telling me that's something is suspicious and I need to reset my password - that feels like the correct security posture I'm used to seeing elsewhere.

ewjoachim commented 2 years ago

The only thing slightly bothering me with this, but I'm not sure there's anything better we can do, is that if an attacker is bruteforcing a victim's 2FA, it means they already bypassed the password. Which means either the password leaked by other means, or the attacker has already pwned the victim's email, in which case locking the account and requiring a password reset will do no good. But no bad either.

And even worse. We could imagine that if we lock the account, upon unlocking, the user gets the rate limits reset. After all, the user proved it's them. But if we do this, it would actually lower the security and allow a much faster attack in the case of email compromise. So even if we lock the account, we must keep the rate limit counts.

spookylukey commented 1 year ago

I'm late to this discussion, but here's what we implemented in django-otp:

We have 2 DB fields on the 2FA device definition:

On successful second factor verification we reset these fields, on second factor failure we set/increment these fields. We then force a wait of 2^throttling_failure_count seconds before allowing another attempt i.e. a "soft locking" approach.

If there is an attempt in the locked time frame, we do not do the second factor check (or increase the failure count), and instead of a message like "your OTP code was incorrect", we visibly report a message like "Due to N successive failures, you will need to wait before attempting again".

It's important to note we only ever check the second factor if the first factor (password) is correct, so the attacker has to have the password to be able to trigger the soft locking.

This has large advantages over any rate-limiting approach:

With this exponential backoff method:

ewjoachim commented 1 year ago

If I'm not mistaken, it means that if an attacker manages to get hold of the password, they can't log in but they can keep the legitimate user from logging in, by spamming wrong 2FAs. They can run a cron that will try logging in every minute. Of course, most of the time, the script will not be able to submit anything, but when it does, it will lock the account for days, then weeks, then months.

The real user will know it wasn't them, and they will hopefully think to change their password. Or you could prompt them to do that.

I'm not sure I understand this.

That said: