lfaucon commented 2 years ago

@GresilleSiffle may recommend to use Fail2Ban https://www.fail2ban.org/wiki/index.php/Main_Page

lfaucon commented 2 years ago

We may limit to 1000 request per minute

100 request per minute seems to be an intense user behaviour (note that submitting a comparison generates about 10 requests)

amatissart commented 2 years ago

Fail2Ban may be useful to protect the servers against attacks outside of the application itself (eg. ssh, etc.).
To define limits on specific requests or actions in the API, I think it would be much easier to use the throttling mechanisms provided by Django Rest Framework: https://www.django-rest-framework.org/api-guide/throttling

193 also describes a similar need.

GresilleSiffle commented 2 years ago

That's true, DRF Throttling and fail2ban are two different things.

The DRF Throttling will help to limit the number of requests a client can make. If I understood correctly, once the limit is reached, the clients get HTTP errors (or a cached version of the requested page?). It works at the application level, and reduces the system load without banning anyone. For me, it's the first level load smoothing we should use in the application.

fail2ban is able to scan any log file, ssh, auth, or even http, and determines, according to a configured threshold, if successive failures must be transofrmed into a temporary IP ban. It works at the system level. It's goal is to limit intrusions, DDoS, by banning IP addresses for a specific duration. In my opinion, fail2ban is also a first level of protection we should use in the system.

We could create two different tasks to adress those different issues, and for instance keep this one for the implementation of the DRF mechanism.

last note: I think with fail2ban we can create specific rules ( jails ) that scan for successive login attempts through the login form, and ban the IP addresses that don't act like humans. We can suppose a human would have used the Forgot your password? page after a certain amount of attempt, or won't have filled a hidden form field in the HTML.

GresilleSiffle commented 2 years ago

About DRF Throttling.

If we have different endpoints with different "cost", DRF allows us to have different throttling configurations.

For instance, downloading the public dataset is an heavy operation, and could have its own throttling scope. Endpoints making calls to the YouTube API could have they own scope too. Other endpoints could have a default global scope.

Example ( values are arbitrary )

REST_FRAMEWORK = {
    'DEFAULT_THROTTLE_CLASSES': [
        'rest_framework.throttling.ScopedRateThrottle',
        'rest_framework.throttling.AnonRateThrottle',
        'rest_framework.throttling.UserRateThrottle'
    ],
    'DEFAULT_THROTTLE_RATES': {
        # anonymous and authenticated users are authorized to make up to 1800 requests per hour 
        # i.e. 1 per 2 seconds per user
        'anon': '1800/hour',
        'user': '1800/hour',

        # requests to the YouTube API have a cost, so are more limited ( twice less than the default )
        # i.e. 1 per 4 seconds per user
        'youtube_api': '900/hour',

        # building the public dataset has a strong cost, so a even more limited
        # i.e. 1 per 10 seconds
        'public_dataset': '10/min',
    }
}

tournesol-app / tournesol

Throttling on sensitive requests #211

193 also describes a similar need.