lobsters / lobsters-ansible

Ansible playbook for lobste.rs
ISC License
79 stars 26 forks source link

add nginx rate-limit #35

Closed pushcx closed 3 years ago

pushcx commented 6 years ago

https://www.nginx.com/blog/rate-limiting-nginx/

An unknown IP started scraping pages at a fast enough rate that @alanpost was paged for the site being down. (It didn't crash, just got unresponsive.) We should pick and add limits for the app and more relaxed limits on assets (js, css, avatars).

jstoja commented 5 years ago

Do you have a rough idea on the number of daily IPs used to visit Lobsters? This is to size the in-memory buffer holding the IPs. It's probably not more than 100k/day. We also should define a req/s for pages and for assets. I propose something like 5/s with a burst to like 10(I don't think there are lots of concurrent calls for that) for the app and something like 10-15/s with a burst at like 25 for the assets.

What do you think about it? (To be honest, I have no experience with that, I would have taken the logs and write a script to approximate it, maybe some tools exist for that)

pushcx commented 5 years ago

Last we checked it was around 15k unique IPs per weekday, but there are substantial spikes (2-3x? less certain here) when we get linked off YC News or a comment goes viral on Twitter.

EDIT: Please do write a script for analysis! We'll run it on prod logs and add it to the repo for anyone else to pick up.

jstoja commented 5 years ago

I wrote a little script (probably unmaintainable at this step) in ruby (because I assume it'll be easier to integrate at some point, but also because you're probably more used to it than Go or Python). I did some tests locally and here is the current output:

$ ruby nginx_log_stats.rb -f access.log
{:number_of_ips=>3,
 :max_reqps=>16,
 :min_reqps=>1,
 :num_reqs=>137,
 :avg_reqps=>2,
 :avg_reqps_app=>2.8055555555555554,
 :avg_reqps_assets=>0.9166666666666666}

The useful bits for this PR use case would be avg_reqps_app and avg_reqps_assets. I didn't handle anything linked to how many days are in a log file because I assume there's a daily logrotate in place.

If you have any useful idea to make this match a bit more what you'd think we need, please let me know! If you intend to test it, just be sure not to run it on the server as it may cripple CPU and FS for the time of processing.

hmadison commented 3 years ago

@pushcx Can we close this now that we've merged lobsters/b8d91ca?