jsocol / django-ratelimit

Cache-based rate-limiting for Django
https://django-ratelimit.readthedocs.io/en/latest/
Other
1.08k stars 187 forks source link

dos possible by clashing cache keys #179

Closed devkral closed 5 years ago

devkral commented 5 years ago

Ratelimit uses md5 (a broken hash function). This allows an adversary to forge ip addresses or usernames and trigger a rate limit for another user or ip.

This way the targeted user or ip cannot access the server = denial of service (for this user).

Easy fix: switch to sha256 instead of md5 Other (not that effective) fix: hash "group" separate from other parts.

jsocol commented 5 years ago

If you can spoof the values why do you need a hash collision?

I’m fine moving to a better hash but this vector continues to exist. I can try to login your username 10 times with random passwords to try to lock you out. The recommended remediation is not to hard block certain endpoints but to activate some kind of captcha or second factor like an email.

devkral commented 5 years ago

The problem are mainly open source projects which use django-ratelimit wrong. An attacker can easily execute the attack by knowing which parameters, in which order are used. Effect: you may can prevent an admin from logging in.

jsocol commented 5 years ago

Can you detail how you think this attack would work? I'm afraid I don't understand

  1. How it's different—other than being more difficult—than using the username or IP of the target? (The easiest way to generate a collision is to put in the exact same parameters.)
  2. How hashing the group separately would mitigate anything?

Like I said, I'm fine discussing ways to upgrade the hash function, but I want to understand what you're trying to mitigate here. If there's really a distinct attack here that involves specifically constructing a hash collision, for instance, we may also want to mix in some salt-esque data, possibly from SECRET_KEY or another setting/source.

adamchainz commented 5 years ago

@devkral I don't follow. I side with @jsocol in thinking MD5 is sufficient here, since as he says, if the attacker can spoof a value, hash collisions aren't really a problem. Can you give a more concrete example of the attack you think is possible?

adamchainz commented 5 years ago

(oh wow double post at same time)

Another concern I have is that SHA256 is slower so it makes a different kind of DOS more possible, by exhausting CPU resources.

jsocol commented 5 years ago

Also, DoS via ratelimit is noted in the docs, in the Security chapter, as something to consider.

devkral commented 5 years ago

The cross-over between multiple groups is however not noted.

devkral commented 5 years ago

Or if you, for example, use ratelimit to limit the requests per domain a user can generate, you can clash again. I think I have a better proposal: just add a configuration option, so everyone can select the hash algorithm he needs (and maybe split the hash between the hash of the group and the hash of the other parts).

devkral commented 5 years ago

md5 is fine for the group part

devkral commented 5 years ago

md5 is fine for the group part

nope, if you want something really lightweight, use the hash algorithm used in python dictionaries. MD5 is only more efficient because it is an old (and broken) hash algorithm

jsocol commented 5 years ago

@devkral I think we need to see more detail about how hash collisions are actually a problem here beyond a statement that they are. Can you describe in detail how an attacker would exploit that? Keep in mind that a number of parts of the preimage of the hash are not attacker controlled or particularly predictable and change rapidly.

devkral commented 5 years ago

I am not a cryptographer, so I can not give you a mathematically explanation. I work with the specs and the spill-over contradicts the group idea.

Anyway: I created a pull request implementing my proposal

jsocol commented 5 years ago

Hey @devkral, I really want to understand the issue here but we cannot accept new code without understanding the risk. There’s no need to construct an actual colliding hash, but we will need to come to an understanding of how an attacker-controlled value that leads to a collision is actually a distinct attack we can mitigate.

We’re not going to accept the proposal without understanding the attack because we don’t know if the proposal actually mitigates the attack or not.

devkral commented 5 years ago

The attack:

because everything is put into one bucket there is no real distinction between group and adversary controlled parameters. This is very bad in case multiple groups are used. And maybe for sensible actions like email verifications, payments, creating a post, .... It is like a permanent ban for a user on doing something so I see it as an issue.

The concrete attack scenario is:

A blogging platform uses group A for limiting the amount of posts per minute (user id) and group B for login tries (ip) and is reachable by ipv6. The attacker can carefully craft ipv6 suffixes to collide with the victims ratelimit. Therefor he can block the victim from posting new posts.

Second example, ticket system (reason for using sha256):

A ticket system uses ratelimit for blocking ips which access the system too often and is reachable with ipv6. Because the attacker's ip and the victim's ip are in the same bucket he can block with md5 hash collisions the victim from drawing a ticket. The victim can be an organisation (NAT and ipv4).

Hopefully this is enough to convince you to accept this minor change. Have you measured the performance of my patch? It should be nearly on par with the original algorithm.

devkral commented 5 years ago

theoretical performance analysis (of my patch): speed penalty: md5 twice executed, one more access to settings performance increase: use of the % operator to concatenate

adamchainz commented 5 years ago

@devkral Doesn't this require the group names being public, which they aren't, unless the application is open source?

devkral commented 5 years ago

yes, that is right (see above).

devkral commented 5 years ago

Actually the whole code has performance problems (double access to db, not copied hash contexts, ...) and the potential performance problems of my code could be easily fixed (use of lru_cache).

You treat me like your subordinate and waste my time. I don't appreciate this behaviour (as I invest my spare spare time) so I re-implemented a ratelimit system from scratch. If you want to know what can be improved in your code-base have a look: django-fast-ratelimit. (disclaimer: I am not sure if my epoch based bucket algorithm is better). Anyway, I think this issue can be closed.

jsocol commented 5 years ago

@devkral I’m sorry you feel that way but please remember we are all volunteering our spare time.

All code has a cost, so issues need to be demonstrated before it makes sense to add code. With a potential security risk in particular, it’s important to understand the vector so that we can assure we actually fix it.

In this case, I am still interested to know if there are potential threats, but we don’t yet have a clear, demonstrably exploitable attack vector. (And a much simpler one exists, which is what’s documented.)

(I’m not sure I understand your performance concerns either, since there is no DB access in this library and I believe hashes are computed once.)

One other thing to consider: I appreciate that you opened an issue before writing code, that was great. I’m sorry if you feel the effort to write the code was wasted but we never got proof that an issue exists, let alone alignment on approach. Starting with a test (I know we discussed not knowing exactly how to construct a collision but there are tools that can help) instead of an implementation could have helped demonstrate the problem and let us arrive at a solution. I don’t think adding a setting was a great way to start—if there’s a security problem we should try to do the right thing and not give users an additional thing to think about or a footgun. We also would need to assess the implications of a longer cache key, and other potential stumbling blocks or solutions, like mixing in salt data or shifting algorithms, and the risks associated with them.

Given your new library I assume you’re done with this ticket, but if anyone else thinks they understand the risk and wants to help demonstrate the attack then I hope this is a useful starting point.

devkral commented 5 years ago

I analysed your code: there is no caching of the result of the hash function. In the meta level it is more clever to cache the more expensive byproducts (like hash) than just to create the cache-key (except for volatile django execution (will be reinitialized every request)). This eliminates even the rest of the overhead of my solution. The impact of a longer cache key is that some restrictions <256 characters for db backend can be violated if

grouphash and parthash are long (e.g. sha512) or

the prefix is long

I tested and fixed this by using base85 (instead hexdigest) for the keys. Anyway my project is so differently structured that I see no way I can merge my changes (ever).