Rate Limit API calls - Githubissues

dakotabenjamin commented 2 years ago

Is your feature request related to a problem? Please describe. There is currently nothing stopping someone from performing a DOS attack unintentionally by writing a script that GETs big projects (MBs of data, in some cases) in a loop and overloads the database connections. We would not know which user (even if they are using an authentication token!) is performing the requests, or have a way to stop it. This is a risk for reliability of the website.

Describe the solution you'd like We would need to implement a rate limiting feature at least for the weightier requests. In addition we could also begin to attach user id info to requests as well in the logging system.

Describe alternatives you've considered I am wondering if a package like https://flask-limiter.readthedocs.io/en/stable/ would work

petya-kangalova commented 2 years ago

@dakotabenjamin if it is possible to share current API usage based on the Tasking Manager usage on large mapathons?

Aadesh-Baral commented 2 years ago

Findings after exploration this week: As tasking-manager doesn’t have pricing-based limits on API usage and the sole purpose of rate-limiting API calls is to stop DOS attacks I think we can move ahead with soft limiting APIs for now.

Side effects

Increased computation by limiter functions can lead to the slow performance of API.
It can leave a large memory footprint as it has to store logs of user requests which might lead to performance issues. To overcome this we have to set a memory-efficient rate-limiting strategy.

Library to use:

Flask Limiter (used by popular flask applications like redash, flaskbb)

Rate Limit strategy: Flask-Limiter comes with three different rate-limiting strategies built in. I suggest using the default fixed window strategy as it is the most memory-efficient strategy. It does however have its drawbacks as it allows bursts within each window - thus allowing an ‘attacker’ to bypass the limits. For example, if you specify a 100/minute rate limit on a route, this strategy will allow 100 hits in the last second of one window and 100 more in the first second of the next window. To ensure that such bursts are managed, you could add a second rate limit of 2/second on the same route.

To Discuss:

Threshold of rate-limiting.
API to rate limit. Suggestion:
- Provide a default rate limit that will be applied to all endpoints and modify limits for endpoints that requires special rate limits using a decorator.
- Identify high priorities endpoints based on the following criteria: -> Endpoints that users with any role can access. -> Endpoints that involve connection to external APIs like OpenStreetMap API. -> Endpoints that perform the creation/update of resources in the database. -> Endpoints that involve big-size data (e.g. API returning big project data) and involve huge computations.

References: https://flask-limiter.readthedocs.io/en/stable/ https://nordicapis.com/everything-you-need-to-know-about-api-rate-limiting/ https://docs.github.com/en/rest/overview/resources-in-the-rest-api#rate-limiting https://docs.microsoft.com/en-us/linkedin/shared/api-guide/concepts/rate-limits?context=linkedin%2Fcontext https://dev.bitly.com/docs/getting-started/rate-limits/ https://www.figma.com/blog/an-alternative-approach-to-rate-limiting/#sliding-window-counters

Open source flask applications with flask_limiter: https://github.com/getredash/redash https://github.com/flaskbb/flaskbb

dakotabenjamin commented 2 years ago

@petya-kangalova if we know a date/time period to check we have monitoring for this

@Aadesh-Baral agree on all points. Let's discuss at next meeting to start analysing endpoints for usage and computational "cost". Much of this can be determined through New Relic or internal database insights.

hotosm / tasking-manager

Rate Limit API calls #5122