helidon-io / helidon

Java libraries for writing microservices
https://helidon.io
Apache License 2.0
3.49k stars 562 forks source link

Provide configurable option to queue requests when concurrency is limited with "max-concurrent-requests" #9229

Open vasanth-bhat opened 2 weeks ago

vasanth-bhat commented 2 weeks ago

Environment Details

In. Helidon 4.x with. Webserver that supports loom based virtual threads , uses the new thread per request model . So by design there is no longer server thread pool or any associated queues where requests get queued .

By default. there is no limit on concurrency and this can lead to issues when resources such as DB connections, external system integrations, and. other such downstream resources are limited. This can lead to performance degrade and also errors when requests timeout waiting for such resources.

To address this Helidon provides the "max-concurrent-requests" parameter on the Listener configuration. While it helps to limit the concurrency , the services are running into issues when trying to use this parameter to limit the concurrency

When the "max-concurrent-requests" parameter is set, any surge requests beyond the limit get rejected and fail with 503. There can be occasional surges that can cause the concurrency to go beyond the configured limit, and such cases teh requests would error out. This behaviour is not consistent with the behavior in earlier versions of Helidon where under this situation the requests would get queued in the queue associated with Helidon's server thread pool.

It would be good have an additional configurable options in Helidon 4 , where. one can additionally enable queueing of requests , when a limit is configured for max-concurrent-requests" Something like below

server :
max-concurrent-requests : 40 request-queue : enable : true max : 100

tomas-langer commented 1 week ago

This is not different from setting max-concurrent-requests to 140. Virtual threads do not consume resources when waiting on a lock. As long the the data source supports queuing of requests, this will work as intended, and only in case you get 140 requests that access the same data sources and have to queue would you get server overloaded (as you would with queing on the server enabled). Can you explain what is the advantage of queuing on server level?

barchetta commented 1 week ago

The Fault Tolerance Bulkhead feature (SE, MP) provides a mechanism for rate-limiting access to specific tasks. You control both parallelism and wait-queue length.

See the Helidon SE Rate Limiting example for examples of using a Bulkhead as well as a Java Semaphore for doing rate limiting.

I think of max-concurrent-request as a hard cap to protect the integrity of the server. Then use Bulkheads or Semaphores to have more fine grained control of rate limiting on individual tasks.

scottoaks17 commented 1 week ago

The bulkhead feature requires programmatic changes, where providing the queue via max-connection-requests would just be a config change to make old code still behave the same way.

romain-grecourt commented 1 week ago

You can setup bulkhead for all requests with a filter:

int rateLimit = Config.global().get("ratelimit").asInt().orElse(20);
Bulkhead bulkhead = Bulkhead.builder()
        .limit(rateLimit)
        .queueLength(rateLimit * 2)
        .build();
routing
        .addFilter((chain, req, res) -> {
            try {
                bulkhead.invoke(() -> {
                    chain.proceed();
                    return null;
                });
            } catch (BulkheadException ex) {
                res.status(Status.SERVICE_UNAVAILABLE_503).send();
            }
        })
vasanth-bhat commented 6 days ago

Yes, This is not same as having the ability in Helidon level, and at Helidon level behavior is not consistent with H3 and individual services have to make code changes to implement this.