openstreetmap / operations

OSMF Operations Working Group issue tracking
https://operations.osmfoundation.org/
99 stars 12 forks source link

Many mappers behind a single IP address may trigger max connection rate DDOS protection #1085

Closed Firefishy closed 3 months ago

Firefishy commented 6 months ago

HOT and others mapping events where there are a large (20+ users) behind a single IP address can sometimes trigger our DDOS protections and are then blocked for a few hours.

We are using mod_evasive which is unfortunately a very blunt tool and does not give us much flexibility and what type of requests are being blocked.

Firefishy commented 6 months ago

I have asked the team which were affected for more details.

mmd-osm commented 6 months ago

In case this is related to mod_evasive, CGImap used to have similar issues years ago before the authenticated user information was taken into account as well (see https://github.com/openstreetmap/operations/issues/36). I suppose mod_evasive doesn't support this at the moment and would treat all users as a single user only.

(I'm also linking to https://github.com/facebook/Rapid/issues/1424 for visibility)

tomhughes commented 6 months ago

The two cases are completely different though - the cgimap limit is about the number of bytes downloaded and mod_evasive is about the number of requests being made in a very short window.

mmd-osm commented 6 months ago

I don't think it's completely different. My point was about identifying single users behind a single NAT IP. This was added to CGImap (as well as several editing apps) at some point, and appears to be absent for mod_evasive as far as I'm aware.

tomhughes commented 6 months ago

Well yes of course it's absent from a generic tool in a web server that knows nothing about the application.

The point is that unless they have a huge number of of users they really shouldn't be able to trigger mod_evasive - it's looking for massive numbers of more or less simultaneous requests. The cgimap limiting looks at average behaviour over relatively long windows.

mmd-osm commented 6 months ago

You’d be surprised how relatively easy it is to trigger this situation even as single Rapid user. Due to the way mod_evasive breaks „authenticated“ cors preflight requests, users are left with incomplete map data and no error messages in the app.

We’re discussing the issue over at https://github.com/facebook/Rapid/issues/1424 and are still collecting some more details.

bhousel commented 6 months ago

Yeah a screenful of tiles could kick off dozens of tile requests almost simultaneously. Especially if the user has a large display, and the code starts requesting off screen tiles to tell where roads connect. I could definitely introduce a queue, or slow down the requests a bit if it would help.

tomhughes commented 6 months ago

What does tile requests have to do with anything? We're talking about API calls.

bhousel commented 6 months ago

What does tile requests have to do with anything? We're talking about API calls.

Sorry for the confusion, our app calls lots of things "tiles".. In this case each "tile" is a GET /map?bbox call (and I guess each one needs a preflight OPTIONS call too).

tomhughes commented 6 months ago

Well yes if you deliberately break the data load up into multiple chunks that is going to pessimise lots of things and more so if you fire them all of in parallel - that will look like an attack because it pretty much is one in many senses.

mmd-osm commented 6 months ago

We're likely seeing a combination of different issues here. One of them is related to mod_evasive error responses:

Since every request sends an Authorization: header, Rapid is triggering a CORS preflight request each time. At some point, mod_evasive would start rejecting these OPTIONS requests. Unfortunately, the reponse lacks an Access-Control-Allow-Origin: * header. Rapid cannot evaluate the error response and bails out. The user would not see any error message in that case.

tsmock commented 6 months ago

Well yes if you deliberately break the data load up into multiple chunks that is going to pessimise lots of things and more so if you fire them all of in parallel - that will look like an attack because it pretty much is one in many senses.

I think both iD and Rapid do this (make requests in tiles). Have we seen any bug reports w.r.t. iD? I would assume that iD also triggers that code path, especially when deployed off of OSM.org (i.e. embedded in the HOT Tasking Manager).

Technically iD and Rapid could switch to a lower tile level by default, and then iterate every time a tile returns a response that the request was too big. I don't remember what z level they use by default, but I think it was z15. They probably already have code to do the iteration, so it would "just" be a matter of changing the default z level.

bhousel commented 6 months ago

I think both iD and Rapid do this (make requests in tiles). Have we seen any bug reports w.r.t. iD? I would assume that iD also triggers that code path, especially when deployed off of OSM.org (i.e. embedded in the HOT Tasking Manager).

The code in iD is pretty different but they use the same approach of tiled bbox requests. I'm ok with just delaying the requests in Rapid a little bit. Also, iD might have more built in delay just by being slower in other parts of the code - like if it takes a second to draw the scene, it won't have time to send off so many bbox requests.

Technically iD and Rapid could switch to a lower tile level by default, and then iterate every time a tile returns a response that the request was too big. I don't remember what z level they use by default, but I think it was z15. They probably already have code to do the iteration, so it would "just" be a matter of changing the default z level.

Nah there isn't anything like that in the code, but we've thought about it for 10 years. Dynamically changing the zoom level that we fetch OSM data at is https://github.com/openstreetmap/iD/issues/1520

bhousel commented 6 months ago

Since every request sends an Authorization: header, Rapid is triggering a CORS preflight request each time. At some point, mod_evasive would start rejecting these OPTIONS requests. Unfortunately, the reponse lacks an Access-Control-Allow-Origin: * header. Rapid cannot evaluate the error response and bails out. The user would not see any error message in that case.

Yep, I was just able to get it into this state., though I'm not really sure how to trigger it in a repeatable way. I was able to get these errors by scrolling around the ocean - these requests are empty and fast so it's easy to fire off a lot of them 😆 https://rapideditor.org/canary#map=15.01/37.3533/-123.0540&background=Bing

Screenshot 2024-05-29 at 9 26 46 AM

Screenshot 2024-05-29 at 9 26 52 AM

pnorman commented 6 months ago

Unfortunately, the reponse lacks an Access-Control-Allow-Origin: * header.

Would adding this headers to all error responses help? It won't fix any blocking problems, but it would allow meaningful error responses.

tomhughes commented 6 months ago

You have skipped question one - would adding this header to all error responses be possible? Whether it would help or not is irrelevant if it's not possible and I'm not aware of any way offhand to do it.

tomhughes commented 6 months ago

I think you probably could do it with something like:

Header always set Access-Control-Allow-Origin * "expr=%{REQUEST_STATUS} >= 400"
Firefishy commented 6 months ago

The corporate mapping event was being blocked by timeouts triggering fail2ban to block their gateway addresses.

This change https://github.com/openstreetmap/chef/commit/647d76065793488df0ce9ed1840e5b8743f779ab has now been deployed which sets more lenient timeouts and is reducing the number of hosts being blocked due to frequent timeouts. I will continue to monitor and see if further changes are needed.

I will re-word this ticket to match the cases others have observed.

mmd-osm commented 6 months ago

We are using mod_evasive which is unfortunately a very blunt tool and does not give us much flexibility and what type of requests are being blocked.

Some blog posts mentioned that mod_evasive is context aware in terms of Apache directives, and could be fine tuned depending on the type of request. As an example, /map requests with Authorization header could be configured with a different set of parameters.

For reference: https://docs.apiscp.com/admin/Evasive/#filtering-individual-resources

A word of caution: I haven't tried this out myself and cannot comment if this works as advertised.

bhousel commented 6 months ago

Some blog posts mentioned that mod_evasive is context aware in terms of Apache directives, and could be fine tuned depending on the type of request. As an example, /map requests with Authorization header could be configured with a different set of parameters.

I thought about suggesting this, but then scrapers would probably just add a fake Authorization header. I think mod_evasive wouldn’t be able to tell whether the header is legitimate, only that it’s present.

mmd-osm commented 6 months ago

mod_evasive has no concept of an HTTP header field, that’s part of the Apache rules defining the context. So we have a bit more options in that case: You can check for a proper Bearer token structure (length and allowed chars), and leave the rest to fail2ban and CGImap. Checking tokens in CGImap isn't super expensive either.

Posting lots of requests with invalid credentials would quickly block the IP address. Excessive /map downloads can be also be identified and blocked in CGImap. That's pretty much the same we had before mod_evasive was enabled.

Firefishy commented 3 months ago

We are no longer using mod_evasive. If we return to it, then likely this ticket should be revisited.