reanahub / reana-server

REANA API server
http://reana-server.readthedocs.io/
MIT License
5 stars 37 forks source link

rate limit for run-on-reana launch endpoint #443

Closed tiborsimko closed 2 years ago

tiborsimko commented 2 years ago

Currently, REANA cluster administrators can set API endpoint rate limiting via RATELIMIT_GUEST_USER and REANA_RATELIMIT_GUEST_USER environment variables. The default value is "20 per second".

This is generally OK for "fast" endpoints, but it may be too much for "slow" endpoints.

For the Run-on-REANA sprint, we shall have a run?from=... like endpoint which will gather the workflow specification from external sources. If the specification is living inside big tarball, it may take several seconds to get the sources and start a workflow.

This could bring our cluster to the knees.

It is therefore good to investigate solutions to prevent this:

We should investigate best options and implement either an in-app solution or an external-service solution to prevent cluster overload when many hundreds of users would click on "Run-on-REANA" badge at the same time.

VMois commented 2 years ago

After investigation, I found that you can define custom limits for each endpoint in invenio-app using RATELIMIT_PER_ENDPOINT (details).

If I understood correctly, our main goal is to prevent cluster overload. In this case, the above method will not help. invenio-app stores rate information per some key. Their default key is based on IP addresses + User-Agent (details). This means that RATELIMIT_PER_ENDPOINT will prevent the same user from clicking a lot of times on the launch/ endpoint, but will not prevent many different users to do the same.

One possible solution, we can create a new Flask-Limiter and configure it to use the endpoint name as a key.

Will continue the investigation.

VMois commented 2 years ago

@tiborsimko should I add a configurable rate limit for launch/ endpoint? Like REANA_LAUNCH_RATE_LIMIT in helm chart?