Open vasanth-bhat opened 2 weeks ago
This is not different from setting max-concurrent-requests
to 140
.
Virtual threads do not consume resources when waiting on a lock. As long the the data source supports queuing of requests, this will work as intended, and only in case you get 140 requests that access the same data sources and have to queue would you get server overloaded (as you would with queing on the server enabled).
Can you explain what is the advantage of queuing on server level?
The Fault Tolerance Bulkhead feature (SE, MP) provides a mechanism for rate-limiting access to specific tasks. You control both parallelism and wait-queue length.
See the Helidon SE Rate Limiting example for examples of using a Bulkhead as well as a Java Semaphore for doing rate limiting.
I think of max-concurrent-request
as a hard cap to protect the integrity of the server. Then use Bulkheads or Semaphores to have more fine grained control of rate limiting on individual tasks.
The bulkhead feature requires programmatic changes, where providing the queue via max-connection-requests would just be a config change to make old code still behave the same way.
You can setup bulkhead for all requests with a filter:
int rateLimit = Config.global().get("ratelimit").asInt().orElse(20);
Bulkhead bulkhead = Bulkhead.builder()
.limit(rateLimit)
.queueLength(rateLimit * 2)
.build();
routing
.addFilter((chain, req, res) -> {
try {
bulkhead.invoke(() -> {
chain.proceed();
return null;
});
} catch (BulkheadException ex) {
res.status(Status.SERVICE_UNAVAILABLE_503).send();
}
})
Yes, This is not same as having the ability in Helidon level, and at Helidon level behavior is not consistent with H3 and individual services have to make code changes to implement this.
Environment Details
In. Helidon 4.x with. Webserver that supports loom based virtual threads , uses the new thread per request model . So by design there is no longer server thread pool or any associated queues where requests get queued .
By default. there is no limit on concurrency and this can lead to issues when resources such as DB connections, external system integrations, and. other such downstream resources are limited. This can lead to performance degrade and also errors when requests timeout waiting for such resources.
To address this Helidon provides the "max-concurrent-requests" parameter on the Listener configuration. While it helps to limit the concurrency , the services are running into issues when trying to use this parameter to limit the concurrency
When the "max-concurrent-requests" parameter is set, any surge requests beyond the limit get rejected and fail with 503. There can be occasional surges that can cause the concurrency to go beyond the configured limit, and such cases teh requests would error out. This behaviour is not consistent with the behavior in earlier versions of Helidon where under this situation the requests would get queued in the queue associated with Helidon's server thread pool.
It would be good have an additional configurable options in Helidon 4 , where. one can additionally enable queueing of requests , when a limit is configured for max-concurrent-requests" Something like below
server :
max-concurrent-requests : 40 request-queue : enable : true max : 100