Open alpe opened 10 months ago
I guess if you have large requests and provide Lingo as a public service this would be a real concern. Let's assume each lingo instance can have 60k open connections max and each request is 1 MB then you would need 60GB of memory to hold those requests. Someone that runs a large public Lingo instance might have other DDoS protections in-place on top of Lingo and in that case wouldn't need this feature (e.g. an API gateway or other software that includes such protection).
My vote would be to postpone this until we have a user that runs Lingo on a public endpoint. I am not against including this though. @nstogner your thoughts?
If you are implementing this, I would want a default of unlimited or a number so large that a user with plenty memory and no malicious actors (e.g. internal lingo) wouldn't encounter an error.
Incoming requests are queued in memory until capacity on a serving backend becomes available. This can be critical in a peak load or DoS scenarios. Instead of having this unbound, we should fail fast and reject new requests with
StatusServiceUnavailable (503)
. The total queue limit could be dynamic and/or fix value (due to memory limitations).For dynamic calculations:
factor * total_number_of_replicas * concurrent_requests_per_replica
. The factor should be defined in context of the time required to scale up instances. I think, I saw10
x somewhere in a similar project but I can not find the number now. Would be a good start parameter to costumize for different environments.