Prevent denial of service attacks

joseph-reynolds commented 4 years ago

This issue is to enhance the BMCWeb, netipmid, and ssh with rate-limiting features to prevent or mitigate volume-based denial of service (DoS) attacks. For background, see the DoS considerations doc review.

Ideas include:

Enhance network ipmi to delay for a few seconds when authentication fails.
Enhance BMCWeb to delay a few seconds when authentication fails.
Defend against a slow POST and similar HTTP-protocol-based attacks. For example, implement a connection timeout. Review maximum allowed header lengths.
Add a capability to blacklist (or whitelist) IP addresses. Or better, have BMCWeb adapt to troublesome IP addresses by implementing delays or dropping the connection for a time.
Implement a maximum number of simultaneous connections from a single IP address.
Implement a maximum number of simultaneous connections from a single single user.
Implement a maximum number of simultaneous connections (Redfish MaxConcurrentSessions) for sessions, IPKVM sessions, etc.
Implement protection against GET flooding. Prevent too many unauthenticated GET requests from causing DoS. Possibly degrade the service

amboar commented 4 years ago

I feel like this should be split into smaller issues that can be directly actioned, that way we know when to close them. Issues that collate many items like this one tend to stay open forever and eventually get ignored.

joseph-reynolds commented 4 years ago

Note that being able to disable an interface may also help prevent DoS attacks. See #612 .

joseph-reynolds commented 4 years ago

Note that the defenses for CWE-307 "Improper Restriction of Excessive Authentication Attempts" can lead to either denial of service (such as account lockout) or to degraded service (such as delaying a few seconds when authentication fails), both as described above, depending on which defenses are used.

joseph-reynolds commented 4 years ago

Idea: For the BMCWeb server, rate-limit local-account authentication attempts.

Function pamAuthenticateUser() in https://github.com/openbmc/bmcweb/blob/master/include/pam_authenticate.hpp is used to authenticate new login sessions (via /login or /redfish/v1/SessionService) and for Basic Auth, for local users. The main idea is to enhance this function to rate limit excessive authentication attempts per user:

If the authentication attempt is for a valid username, remember how many consecutive auth failures were encountered. (This counter does not need to be persisted.)
If the counter is more than 3 or the username is unknown, delay a few seconds before continuing with the request.

This does not address LDAP users.

This design means we won't need account lockouts to prevent CWE-307. This design addresses the second bullet above. And it seems easy enough to implement. (However, I hesitate to present these ideas, because it seems like a design to solve the problem probably already exists, and it is difficult to get security designs correct. For example, the delay should not give the caller a clue if the username exists. Nevertheless, here are the ideas.)

joseph-reynolds commented 4 years ago

Another idea: In addition to the idea above, and as an independent mechanism, have an overall rate-limit for failed authentication requests. For example, when BMCWeb sees "too many" failed authentication requests, it can immediately return HTTP status code 429 (Too Many Requests) with HTTP response header "Retry-After: 3".

The low level design ideas: enhance bmcweb function pamAuthenticateUser() to remember authentication requests (like count of failed attempts and timestamp) and issue 429 as needed (instead of authenticating).
I do not see the need to persist this data. That is, whenever bmcweb is restarted it will start over with counting authentication failures.
TODO: What do other webservers do?

joseph-reynolds commented 4 years ago

Should we also continue to have a feature to lock an account after too many unsuccessful authentication attempts (in addition to rate-limiting)?

alext-w commented 4 years ago

IMO we should drop that (if that already exists), as we've discussed during Security WG meetings, this is an effective way for an adversary to cause DoS for legitimate users.

joseph-reynolds commented 4 years ago

The account lockout mechanism would be to satisfy specific higher-security requirements for systems installed in higher-security environments. The lockout mechanism would be disabled by default, and the BMC admin could enable it. I believe we still have these requirements and want to confirm.

joseph-reynolds commented 4 years ago

Searching for "how to implement rate limiting" gives useful background information such as:

joseph-reynolds commented 4 years ago

Per "rate-limit local-account authentication attempts" via BMCWeb, there is an experimental prototype in review here: https://gerrit.openbmc-project.xyz/c/openbmc/bmcweb/+/31841

joseph-reynolds commented 4 years ago

NIST guidelines suggest rate-limiting is effective. NIST special publication SP800-63B ("Digital Identity Guidelines / Authentication and Lifecycle Management") says (paraphrased here):[sections under 5.1] "memorized secrets" (which includes passwords) SHALL [section 5.2.2] protect against guessing such as by rate-limiting authentication attempts. It also suggests locking the account only after 100 incorrect authentication attempts. Appendix A.5 suggests appropriate rate-limiting is an effective defense.

How much do we need to slow down password guessing? Limiting authentication attempts to once every 10 seconds gives an attacker approximately 300 thousand guesses per month. During this attack: this causes legitimate users to experience rate-limiting for most of their login attempts, and would generate thousands of security audit log entries showing that rate-limiting was engaged.

joseph-reynolds commented 4 years ago

I asked the Linux-PAM project what PAM error code PAM modules (like pam_abl) return when authentication is not allowed because rate-limiting is being applied: https://github.com/linux-pam/linux-pam/issues/216

joseph-reynolds commented 4 years ago

I presented my authentication rate-limiting prototype (link above) to the OpenBMC community and Linux-PAM and got some curiosity but no traction. Basic questions were like: What's wrong with pam_abl? How are account lockouts different from rate limiting? ==> What's the right way to proceed?

The pam_abl (auto-blacklisting module) is close to what I want. I would consider enhancing it to perform a rate-limiting function instead of a timed-account locking. The source repo seems to be: https://github.com/deksai/pam_abl and licensed as GPL-3.0 Specifically, I like its per-USER and per-HOST capabilities. If there was an option to rate-limit instead of lock the account, that would be close what I could use.

I can think of two possible problems enhancing pam_abl: (1) the module would have to record all usernames being attacked, not just ones that exist on the system. If it exhibited different behavior wrt rate-limiting, then attackers could use that to enumerate accounts. This vulnerability only appears when we move away from locking toward rate-limiting. (2) Same scenario as 1. But the attacker uses differences in timing to enumerate accounts.

Alternatively, I can enhance my prototype to add per-USER and per-HOST capabilities. Here is my idea:

In my little world, a BMC is only contacted by several remote HOSTs at a time, certainly less than a dozen all failing to authenticate at the same time. So it would only have to remember that many HOSTs before going into overload mode.
Similarly, a BMC only has a few users logged in at a time, and does not need to gracefully handle over a dozen users all failing to authenticate at the same time. So it would only have to remember that many USERs before going into overload mode.
If either of these limits is reached, the auth module can go into overload mode which means it rate limits all authentication attempts for a period of time (which gives the same behavior as the original prototype).

manojkiraneda commented 4 years ago

Hi @joseph-reynolds , When i was reading about the rate-limiting , i had come across pam_sheild, which claims that it uses ip-tables to block users.

https://github.com/jtniehof/pam_shield

Did you check it , i am not sure if it can reduce your work but i feel its worth looking once.

joseph-reynolds commented 4 years ago

Thank you. I had not investigated pam_shield. According to the project README, pam_shield configures iptables to drop packets (I presume) from blocked hosts which makes the BMC unresponsive from that host. That is not the behavior I want. I want to be very polite to the network client and return something like HTTP status 429. So I don't want to use pam_shield.

joseph-reynolds commented 4 years ago

Manoj, my comment above was too narrowly scoped. I was thinking only about rate-limiting and not about overall DoS protection. I apologize for that. Dropping packets can be part of a good solution to protect against DoS attacks.

joseph-reynolds commented 4 years ago

There is a specific friendly-fire use case I want to address along with the general CWE-307 considerations.

Various management agents or tools repeatedly authenticate to the BMC. Authentication can fail if the BMC does any of:

factory resets (either unexpectedly or as part of an update) and the password is expired,
the account password is changed for any reason, or
the account is locked for any reason.

In any of these cases, the agent or tool can quickly fail to authenticate too many times. I want the following behaviors:

The agent or tool should be told the correct reason it cannot authenticate. Currently it is told authentication failed even in cases when it supplied the correct username and password; that is confusing and frustrating. It should be told when authentication failed due to authentication rate-limiting. In particular, it must not be told authentication failed when it has supplied correct credentials.
It would be very good if an admin could correct the problem immediately, perhaps from another system, and not having to wait for the timeout period. Then the agent or tool can continue.

My idea for (2) above is: The rate-limiting mechanism must be sensitive to the remote IP address. That is, the BMC applies auth rate-limiting to requests from IP address X, but the admin can try again from IP address Y which is not (yet) rate-limited, and correct the problem.

manojkiraneda commented 4 years ago

There is a specific friendly-fire use case I want to address along with the general CWE-307 considerations.

Various management agents or tools repeatedly authenticate to the BMC. Authentication can fail if the BMC does any of:
* factory resets (either unexpectedly or as part of an update) and the password is expired,

* the account password is changed for any reason, or

* the account is locked for any reason.
In any of these cases, the agent or tool can quickly fail to authenticate too many times. I want the following behaviors:
1. The agent or tool should be told the correct reason it cannot authenticate.  Currently it is told authentication failed even in cases when it supplied the correct username and  password; that is confusing and frustrating.  It should be told when authentication failed due to authentication rate-limiting.  In particular, it must not be told authentication failed when it has supplied correct credentials.

2. It would be very good if an admin could correct the problem immediately, perhaps from another system, and not having to wait for the timeout period.  Then the agent or tool can continue.
My idea for (2) above is: The rate-limiting mechanism must be sensitive to the remote IP address. That is, the BMC applies auth rate-limiting to requests from IP address X, but the admin can try again from IP address Y which is not (yet) rate-limited, and correct the problem.

Yeah makes sense @joseph-reynolds . I got your point. I feel this solution is good enough for what you have mentioned.

Do you think that pam_sheild works better incase of an intentional DOS attack where the attacker just floods the 443 port with large enough packets without even waiting for the response ? But with the solution mentioned, we are still going to return some reason for not being authenticated for every reply right ? which will affect the webserver ?

My only point was that we need to pick a pam library , which would help in dealing with both the cases - intentional & unintentional auth failures . If pam_abl can do it , then i think we are good. If not then we might have to explore pam_shield as well and see which offer us the best features with less work.

ratagupt commented 4 years ago

Hi Manoj,

On 5/20/20 10:40 AM, ManojKiran Eda wrote:

There is a specific friendly-fire use case I want to address along
with the general CWE-307 considerations.

Various management agents or tools repeatedly authenticate to the
BMC. Authentication can fail if the BMC does any of:

|* factory resets (either unexpectedly or as part of an update)
and the password is expired, * the account password is changed for
any reason, or * the account is locked for any reason. |

In any of these cases, the agent or tool can quickly fail to
authenticate too many times. I want the following behaviors:

|1. The agent or tool should be told the correct reason it cannot
authenticate. Currently it is told authentication failed even in
cases when it supplied the correct username and password; that is
confusing and frustrating. It should be told when authentication
failed due to authentication rate-limiting. In particular, it must
not be told authentication failed when it has supplied correct
credentials. 2. It would be very good if an admin could correct
the problem immediately, perhaps from another system, and not
having to wait for the timeout period. Then the agent or tool can
continue. |

My idea for (2) above is: The rate-limiting mechanism must be
sensitive to the remote IP address. That is, the BMC applies auth
rate-limiting to requests from IP address X, but the admin can try
again from IP address Y which is not (yet) rate-limited, and
correct the problem.
Yeah makes sense @joseph-reynolds https://github.com/joseph-reynolds . I got your point. I feel this solution is good enough for what you have mentioned.

Do you think that pam_sheild works better incase of an intentional DOS attack where the attacker just floods the 443 port with large enough packets without even waiting for the response ? But with the solution mentioned, we are still going to return some reason for not being authenticated for every reply right ? which will affect the webserver ?

I would suggest that why shouldn't we rate-limit to the IP address through IP tables?In that case the solution will not be tied with only user authentication scenario.

My only point was that we need to pick a pam library , which would help in dealing with both the cases - intentional & unintentional auth failures . If pam_abl can do it , then i think we are good. If not then we might have to explore pam_shield as well and see which offer us the best features with less work.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ibm-openbmc/dev/issues/1544#issuecomment-631240164, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADXHP2H26FMYFYBARM5OBG3RSNQ57ANCNFSM4KCTBCWQ.

joseph-reynolds commented 1 year ago

This has become a discussion, not a specific issue to resolve. Closing.

ibm-openbmc / dev

Prevent denial of service attacks #1544