tilezen / tilezen-tasks

Tilezen x-repo tasks
0 stars 0 forks source link

Add tarpit server to throttle API abuse #5

Open nvkelso opened 7 years ago

nvkelso commented 7 years ago

While at Mapzen we occationally saw spikes of 429 requests (access forbidden, either for exceeding their free limits for an API key or historically from an IP address) but send soooooo much traffic our way that our Fastly requests costs were impacted.

Fastly pricing [1] (in NA) is $0.0075 per 10,000 requests. We're getting about 30 million requests per day.

Our monthly cost of 429 requests: 0.0075 / 10000 30000000 30 = $675.00 for 1 month

[1] https://www.fastly.com/pricing

zerebubuth commented 7 years ago

It's worth noting that the tarpit will only work if we have a lot of sequential requests. Something like a browser downloading more than 4 tiles at once and queueing them, or a curl script downloading a bunch of tiles one after the other.

Before embarking on implementation, we should quickly check that we're getting a large number (say, 8 or more) requests per-IP for these 429 cases. If that's not the case, we'll need to look for a different remedy.

nvkelso commented 7 years ago

@zerebubuth now that we don't allow keyless can we just test for same API key in the time window?

zerebubuth commented 7 years ago

I think we might be talking about different things. I'm saying that the tarpit is only effective against certain types of request pattern where a single client is going to wait for the response to one request before making another. We use the fact that they're waiting for a response to slow them down by delaying the response deliberately.

For example, if a single client is requesting things in serial then it looks something like this:

client1: request tile 1
client1: wait for response
client1: do something with the response
client1: request tile 2
etc...

In this case we can slow them down. However, it is less effective if they are performing the requests asynchronously, or a large number of different clients are all making a small number of requests, e.g:

client1: request tile 1
client2: request tile 2
client1: wait for response
client2: wait for response
client1: do something with the response
client2: do something with the response

In this case, we can still slow client1 and client2 down, but it won't reduce the number of requests on us. This is because the rate is dependent on the number of clients (e.g: each making one request), rather than the number of requests.

It seems unlikely that the clients would be completely asynchronous or requesting a very small number of tiles. However, it's an easy check to see what the average count per-IP is for 429s - although it needs to be done against the raw logs, since we don't have that information in the analytics redshift.

For the implementation of this, we wouldn't necessarily have to look at the IP address, instead have some per-API-key 429 counter which trips the tarpit protection. I'm just saying we should check to see if the fix will be effective (it seems likely) before spending the time to implement it.