simplesamlphp / simplesamlphp

SimpleSAMLphp is an application written in native PHP that deals with authentication.

https://simplesamlphp.org

GNU Lesser General Public License v2.1

1.07k stars 680 forks source link

Proposal: prevent password guessing with rate limiting #1259

Open pradtke opened 4 years ago

pradtke commented 4 years ago

Problem

Certain compliance situations require us to prevent brute force password guessing for logins. UserPassBase and implementing classes do not have a mechanism to prevent brute force password guessing.

Proposed Solution

We would like SSP to have built in support (which we are proposing to write) for rate limiting authentication attempts.

Proposed features:

Rate limit based on username
Rate limit based on IP -- Allow support for a white list of IPs not to limit
Rate limit based on password (prevent password spraying attacks) -- Allow configurable hashing algorithm and salt.
User Denial of Service prevention with device cookies
Separate limits per username, ip, and password -- Also allow not limiting by certain criteria (like password)

Non proposed features:

Email user about attacks
Show captchas

Storing rate limiting data

We would use SimpleSAML\Store for storing the data, since it must be persistent across servers and not be tried to a specific user session.. The Store interface does not support atomic incrementation of a counter, thus in a situation where multiple request are simultaneously editing the same tracking information for a user not all of those attempts will be tracked. However, at least one will be tracked. This will allow us to perform rate limiting, even if the actual rate an attacker can do is higher than the limit configured. While not desirable, it is better than no protection and we can document the caveats of the implementation.

Rate limiting algorithm

The two most common algorithms are either a leaky bucket approach or a fixed/sliding window.

In a leaky bucket approach, a given user or ip (or etc) would be given C authentication attempts and every minute could earn N more per minute (up to a maximum of C). This allows for bursty traffic patterns.

A window approach would allow N logins per time window.

I don't have a preference for algorithm. If SSP project does have a preference, please let me know. A fixed window is likely an easy algorithm

User Messaging

I've seen systems that alert the user to being locked out and ones that just always return "Invalid username/password" for locked out users. The former is more user friendly but would require template changes to display the new message (and require translation).

Initial proposal is just to say "Invalid username/password" while logging that the user/ip is locked out in the logs. If SSP project has a preference for better messaging, please let me know.

DoS protection with device cookies

The downside of any system that locks users or ips out after failed attempts is that they can be used to denial of service a user or IP. Device cookies help protect against this. When a user successfully authenticates a long lived device cookie is stored on the browser. The cookie indicates to the system that user XXX has successfully. If a request is made for a locked out user, or from a locked out IP and that request user matches the user associated with the device cookie then the request is allowed through. After N failed attempts with the device cookie then the cookie is also locked out

Describe alternatives you've considered

Finding an apache module to perform rate limiting.

Next Steps

I'm interested in opinions/suggestions/etc prior to us starting to work on this feature.

tvdijen commented 4 years ago

What's holding you back from creating an EnhancedUserPassBase?

A few thoughts on what I've just read:

There are classes inheriting from UserPassBase that would not be able to implement some of the proposed solutions, like the LDAP AuthSource class.. Configurable hashing alg+salt for example, is not applicable here.. Probably only usable for AuthCrypt, which I think already has this possibility..
IP based limits are easy too easy to bypass.. It's not gonna do us any good..
The downside of all these 'solutions' is that they all require a Store.. I think that always should be optional..

I think, we should be able to come up with some of these enhancements, perhaps as traits, that can be easily be reused in custom authsources..

pradtke commented 4 years ago

What's holding you back from creating an EnhancedUserPassBase

Nothing is holding us back from just creating our own. The goal was to make it available to classes that already extend UserPassBase. e.g. users of some subclass could now add some additional config options to the authsource to enable rate limiting.

There are classes inheriting from UserPassBase that would not be able to implement some of the proposed solutions, like the LDAP AuthSource class.. Configurable hashing alg+salt for example, is not applicable here.

I must not have explained it very well. Subclasses wouldn't have to make any changes to take advantage of rate limiting. The hash alg and salt is just for configuring rate limiting based on password. The underlying password store doesn't need to know anything about it. It also doesn't need to be enabled for those that, understandably, don't want to rate limit based on password.

IP based limits are easy too easy to bypass.. It's not gonna do us any good.

It will do us good :) The attacks we see are generally unsophisticated and just use a few IP addresses. Limiting on IP address is useful to us. For an attacker with a botnet with lots of IPs we'll rely on IP reputation and threat lists that get fed into our WAF. Again it optional. If you don't want to rate limit on IP don't enable it.

The downside of all these 'solutions' is that they all require a Store.. I think that always should be optional..

I agree on optional. I also agree that they all require someplace to store the data. I don't know of way to perform rate limiting without storing data.

I think, we should be able to come up with some of these enhancements, perhaps as traits, that can be easily be reused in custom authsources

That's an interesting idea.

hparadiz commented 4 years ago

Just some things to consider.

Typically in a corporate network you have thousands of users sharing a single IP address so rate limiting by IP would be a huge problem there.
Rate limiting by username defeats the purpose because you have to decrypt the assertion before you know what username is logging in and since decryption is so CPU heavy you've already spent most of the computing resources by the time you know you should block them.
With multi-server configurations you will need some kind of storage that can be persistent between requests and also shared among servers.
Stores are slow. The only storage option I can think of that would be good enough is memcached and even then it has a 20 ms connection time and another 30 ms to make a request and receive a response to a query. This would effectively slow down requests for your legitimate users just to see if they are blacklisted.

pradtke commented 4 years ago

Typically in a corporate network you have thousands of users sharing a single IP address so rate limiting by IP would be a huge problem there.

Yup, that's why turning on IP rate limiting is optional, or you can white list your corporate IP.

Rate limiting by username defeats the purpose because you have to decrypt the assertion before you know what username is logging in and since decryption is so CPU heavy you've already spent most of the computing resources by the time you know you should block them.

The rate limiting is for the "enter username and password" form on the IdP side, so there is no assertion to decrypt. I agree an attacker that wants to overload your CPU has other vectors they can use.

Stores are slow.

That is a good point that I hadn't consider. The more things (user, ip, password, etc) you track to rate limit, the slower it will be. However for users logging in as themselves from a browser they have previously logged in with will have a device cookie. If the device cookie is valid (which is one check) and the user enters the correct password then they will get through with one read from the store. IMHO, the vast majority of legitimate traffic will fit that pattern.

FWIW, all our memcached calls are instrumented to time them and the vast majority take less than 10 ms.

jaimeperez commented 4 years ago

I think this is a very good idea Patrick!

I agree that many users could benefit from this out of the box, if we implement it in the base class and you just need to add some configuration directives to it I see some difficult bits though:

Tim has a good point about basing this in the IP address, since it's trivial nowadays to get a botnet with hundreds of IPs. However, if this serves as a mitigation for attacks that are actually happening and you see as common, then this would have value anyway.
@hparadiz also has a good point about stores being slow. This one though is a bit less important in this case, since most of the time you will have a store anyway for your user accounts. Sure, it's an additional roundtrip, but there are options like memcache that would minimize delay.

We need to decide what do we want to protect from, since this could be used both to protect against (D)DoS or against brute forcing. The former needs mechanisms external to the software, in the servers or even the network. The latter is something we can tackle in the software. For that case, I see two approaches (not necessarily excluding each other) that could be interesting:

One is that evaluating if this is is a brute force attack is only relevant when authentication succeeds. This would get rid of the downsides of having an additional store. We need two operations: record that authentication has failed for a given user, and evaluate if the number of tries for this user has been exceeded. The latter is only relevant upon successful authentication because it's only then that the brute force attack has consequences. The goal would then be to return an authentication error to the attacker even if they managed to guess the password.

In this scenario, we can perform the write operation to the backend when the response has already been sent to the user, so that it doesn't affect the response time (analogously to how we save sessions).. On the other hand, the read from the backend operation is only necessary when authentication succeeded, and a few extra milliseconds per successful authentication should not be a problem. The only issue I see is that then we're opening to a side channel attack. The attacker can measure the response time and if it's consistent except for one response, assume that password was correct.

All things considered, it's probably ok to have the extra roundtrip for all authentications anyway, to be on the safe side.
The other approach would be to make this a bit more general, like a risk-based system to decide whether you authenticate the user or not, and how. I can imagine a system to plug evaluators that inspect the context and evaluate the risk for the current authentication, and callbacks that make decisions and take actions based on the evaluated risk. With such a system, you could implement this functionality with an evaluator (or a set of them) that assigns a high risk to a successful authentication when it has been preceded by multiple tries of completely different passwords, and a callback that would interpret that as a brute force attack and deny authentication.

The advantage with this is that you could also have much more fine-grained approaches. E.g. the action could be to require a second factor if available in the attributes and the risk is evaluated to be high. Or the evaluator could increase the risk if there's no such device cookie. We would have an incredibly flexible system that would allow you to perform authentication with different conditions depending on the context, and customize it with configuration rather than implementing it yourself.

Does that sound interesting?

pradtke commented 4 years ago

Thanks @jaimeperez, @hparadiz and @tvdijen for you feedback. This has been helpful to me.

One is that evaluating if this is is a brute force attack is only relevant when authentication succeeds.

That is interesting idea to do it at that point, but as you pointed out there is the worry about side channel attacks with the timing or behavior. I suppose if we're going to do the checks regardless of authentication failure or success, then it wouldn't matter if it was before or after. In that case I think doing it prior to the authentication check could have some benefits, such as not passing known "bad" requests to the underlying password store where it could trigger additional problems (like AD lockout policies designed for internal connections).

The other approach would be to make this a bit more general, like a risk-based system to decide whether you authenticate the user or not, and how. I can imagine a system to plug evaluators that inspect the context and evaluate the risk for the current authentication, and callbacks that make decisions and take actions based on the evaluated risk

I like that idea a lot. It reminds me of the PAM stack for unix logins. I think I understand what an evaluator would look like, but I'm having trouble picturing the callbacks, where they are defined and the parameters.

For example, if a deployer of SSP wanted to show a captcha after 5 failed logins, and block for 10 minutes after 15 failed logins, what are you thinking pseudo code wise? Is there a TooManyLoginAttemptsForUser evaluator that calls a callback after 5 attempts? And the callback gets arguments about the number of attempts so it can decide what to do?

Or is there a chain of evaluators that create/define some structure that is passed to the callback?

Where would a deployer define their callback? Can that be done in authsources.php (I thought I remember an issue defining functions there)? Or are there some common callback behaviors that we include, and the deployer can reference in authsources.php?

Or are you thinking something else?

pradtke commented 4 years ago

I've pushed an alpha version of rate limiting to https://github.com/cirrusidentity/simplesamlphp-module-ratelimit.

@tvdijen Since it is a standalone module, I followed your suggestion of using a different class. It can wrap another UserPassBase so installers can add rate limiting to the ldap, admin, etc authsources through configuration

@hparadiz I added a log line that logs the overhead time of running rate limiting for each authentication attempt, along with the time taken by the underlying authsource. I can report real wold timing results once we finish our internal testing and run this in production for awhile. For IP based restrictions, they are disabled by default, and if enabled it also supports white listings IPs and subnets

@jaimeperez I used your evaluator idea, though I called it Limiter. Installers can enable different limiters as they see fit. They can also write their own. I included 4 of them in the module, 2 of which are enabled by default.

Anyhow, we'll test and gather data, and then we can discuss next steps.

tvdijen commented 4 years ago

This is pretty cool @pradtke ! I have one remark on the 'delegate' setting though. I think it would be better to refer to the delegate authsource by name, instead of having all of the delegate authsource's config in your module's config. We do this in a similar manner for the negotiate-module, so we can keep authsource-config in one place.

Also, I see you're introducing an InMemoryStore for testing purposes.. Perhaps it's better to reuse Symfony's Array Cache adapter instead..

pradtke commented 4 years ago

@tvdijen Using \SimpleSAML\Auth\Source::getById($authId); is actually what I did in my first pass through the code. Then I realized that the authsource I wanted to protect could still be called directly by an attacker via the path module.php/core/authenticate.php?as=ldap, thus bypassing the rate limiting. As a result I did the (ugly) method of using reflection and embedding the other authsources config in mine , to ensure it can't be called directly. I'm certainly open to other ideas (since it is ugly)

For the Cache adapter I can take a look, though I don't understand the intent. Is your idea that I create an SSP Store to PSR Cache bridge? I wonder about the 'key' syntax compatibility - the PSR cache has restrictions on characters in the key, which the SSP Store does not.

tvdijen commented 4 years ago

For the Cache adapter I can take a look, though I don't understand the intent. Is your idea that I create an SSP Store to PSR Cache bridge? I wonder about the 'key' syntax compatibility - the PSR cache has restrictions on characters in the key, which the SSP Store does not.

My idea was to use the PSR array cache directly from your tests, so you don't need your own class.. Must admit I missed the string typehints there, so it may not be a good idea..

We'll discuss the other this week during our Madrid meetup

thijskh commented 4 years ago

I think it’s great to see there’s a module already. Simce it’s now in a separate module I would propose to remove this feature as a todolist item for 2.0. That doesn’t preclude working on it, but it does also not make it a blocker for 2.0.