jetty reactive client bottle neck for spring reactive flapmap operator

patpatpat123 commented 1 year ago

What I am trying to achieve:

Send as many http requests as possible, in parallel, to a very reliable third party service from an aggressive Flux

Background:

The third party service is very reliable and can sustain a very high number of requests. So far, I am the only client of this third party service. I am invited to hit the third party server as hard as possible. The third party service, which I have no control over, does not offer any bulk/list API. I can only send requests one by one. On their side, each request takes a constant one second to process.

This third party API has keep-alive enabled. But does not support GRPC, http2, socket. There is no rate limit at any point of the end to end flow.

What did I try:

Here is the configuration of my http client, as well as the logic to send the http requests (hopefully as many requests, as fast as possible)

client

    public WebClient webClient(final WebClient.Builder builder) {
        final var clientConnector = new ClientConnector();
        final var httpClient = new org.eclipse.jetty.client.HttpClient(new HttpClientTransportDynamic(clientConnector));
        httpClient.setMaxRequestsQueuedPerDestination(9999);
        httpClient.setMaxConnectionsPerDestination(9999);
        return builder.baseUrl(hostAndPort).defaultHeader(HttpHeaders.CONTENT_TYPE, MediaType.APPLICATION_JSON_VALUE).clientConnector(new JettyClientHttpConnector(httpClient)).build();
    }

the flux:

// some very fast flux, this line is just an example
        Flux<String> inputFlux = Flux.interval(Duration.ofMillis(1)).map(i -> "transform to request payload number " + i);
        //send as many requests as fast as possible, note the 4096
        Flux<String> resultFlux = inputFlux.flatMap(oneInput -> webClient.post().bodyValue(oneInput).retrieve().bodyToMono(String.class), 4096);
        //doing something with the result
        return resultFlux.map(oneResult -> doSomething(oneResult));

Using this, I asked the third party service and they gave me a number N, my number of requests per second.

First observation, the number of concurrency for flatmap here is 4096. And since the third party takes one second to process the request, I would have expected a rate N of 4096 requests per second.

However, I am nowhere close. The third party service told me I am at 16ish requests per second.

Issue:

I believe the underlying limitation comes from jetty http client. I interchanged the webclient (which uses jetty) with a dummy operation, and could see much higher throughput.

I believe the issue here is that the scheduling policy of the jetty-reactive-httpclient library is limiting the throughput. What parameters, the number of concurrent connections, possibly IO threads, keep alive, I should use in order to "unleash" jetty reactive http client?

sbordet commented 1 year ago

I believe the underlying limitation comes from jetty http client.

I don't think so.

Jetty's HttpClient can easily do 10-100 thousands of requests/s.

I would start by using it directly rather than using Flux. Also, explicitly configure the thread pool and the number of selectors on ClientConnector, depending on the hardware spec your client is running on.

Once you achieve the desired number of requests/s you can reintroduce reactive and/or Flux and see how it goes.

rstoyanchev commented 1 year ago

I doubt it has anything to do with the Flux used to fire the requests, but you can also try replacing it with an Executor to keep it simple and transparent. More suggestions, try with other ClientHttpConnector implementations, as well as directly with Jetty's HttpClient.

patpatpat123 commented 1 year ago

Thanks to your comments.

Just adding some more facts here:

I tried this (note the concurrency number)

// some very fast flux, this line is just an example
        Flux<String> inputFlux = Flux.interval(Duration.ofMillis(1)).map(i -> "transform to request payload number " + i);
        //send as many requests as fast as possible, note the small number 4
        Flux<String> resultFlux = inputFlux.flatMap(oneInput -> webClient.post().bodyValue(oneInput).retrieve().bodyToMono(String.class), 4);
        //doing something with the result
        return resultFlux.map(oneResult -> doSomething(oneResult));

And could see in the logs threads like :

[HttpClient@336f49a1-48]
[HttpClient@336f49a1-50]
[HttpClient@336f49a1-47]
[HttpClient@336f49a1-49]

i.e, with flatMap set to 4, it seems there are 4 threads from Jetty HttpClient (please correct me if I am wrong)

Now, if I increase to 8 Flux<String> resultFlux = inputFlux.flatMap(oneInput -> webClient.post().bodyValue(oneInput).retrieve().bodyToMono(String.class), 8;, I do see 8 different [HttpClient@abc-N]

Therefore, I think there is some kind of correlation between the flatMap concurrency number and this [HttpClient@abc-N]

However, as I scale up, 16, 32, 64, 128 [...] at some point I am not able to see the same number of [HttpClient@abc-N]. With concrete examples, if I set flapMap at 4096, I would have expected 4096ish different [HttpClient@abc-1] [HttpClient@abc-4096]

However, not even a hundred could be observed.

May I ask if this is expected?

sbordet commented 1 year ago

I'm not sure what that parameter does exactly, so I cannot comment.

If you're trying to perform some kind of load testing, I feel you are taking the wrong way. Have you tried HttpClient alone to keep things simpler?

patpatpat123 commented 1 year ago

No problem at all, and again, thank you for all the responses.

I am not doing some kind of load testing, this is a real production level business use case "hit the third party server as hard as possible"

I believe I have found further clues.

I am now doing trials and errors (mostly errors, the parameters are a bit overwhelming)

Adding: (note the custom QueuedThreadPool)

  @Bean
    public HttpClient getHttpClient(final MeterRegistry registry) {

        final var threadPool = new QueuedThreadPool(1000);
        threadPool.setName("client-thread");

        final var clientConnector = new ClientConnector();
        clientConnector.setExecutor(threadPool);
        clientConnector.setReuseAddress(true);
        clientConnector.addEventListener(new JettyConnectionMetrics(registry));
        return new HttpClient(new HttpClientTransportDynamic(clientConnector));
    }

Issue 1: with this QueuedThreadPool set to 1000, I would have expected to see [client-thread-1] to [client-thread-1000] doing the work of sending requests

However, I am only seeing

client-thread-106
client-thread-47
client-thread-68
client-thread-76
client-thread-77
client-thread-85
client-thread-92
client-thread-93
client-thread-96
client-thread-97

Is the property max number of thread not being picked up?

Also, may I ask what is the threadPool, QueuedThreadPool, ExecutorThreadPool, another thread pool? That should be the best fit to this production use case (not load testing) which is to span as much thread as possible in order to send as much requests as possible please?

sbordet commented 1 year ago

I think you have wrong expectations.

this production use case (not load testing) which is to span as much thread as possible in order to send as much requests as possible

You are basically trying to max out 2 systems, so it is load testing. Using as many threads as possible is rarely the best solution.

I think HttpClient is using only the 10 threads you are seeing because it is able to cope with the load with only those threads. You likely have filled up all your connections and adding more threads won't help.

You need to carefully monitor CPU, network, JVM and application to understand what's going on.

Please read: https://github.com/jetty-project/jetty-load-generator/blob/2.1.x/README.md

patpatpat123 commented 1 year ago

Understood @sbordet

This is very unfortunate, using this reactive paradigm, we use little little little resource

kubectl -n=production top pod application-65847cb578-dqnb9
NAME                          CPU(cores)   MEMORY(bytes)
application-65847cb578-dqnb9   39m          271Mi

We are only using a very low amount of CPU and memory, using this Jetty, to send some 16 requests per seconds, with a server confirming to receive 16ish requests per seconds (while it should be able to handle 8000/s), while the flux of data incoming in piling up and Jetty is not able to send them

May I ask if there are some documentations on the executor, thread pool and selector please?

sbordet commented 1 year ago

Look, it's not Jetty. There is something else wrong.

The documentation is here: https://www.eclipse.org/jetty/documentation/jetty-10/programming-guide/index.html Look at the Jetty architecture for details on threads and selectors: https://www.eclipse.org/jetty/documentation/jetty-10/programming-guide/index.html#pg-arch

Make sure you have a large maxConnectionsPerDestination and maxRequestsQueuedPerDestination.

Then make sure you actually send the requests without waiting for the responses.

As I said multiple times now, start with Jetty HttpClient alone, no reactive. Make sure you can hit the desired numbers with it. Then introduce reactive.

jetty-project / jetty-reactive-httpclient

jetty reactive client bottle neck for spring reactive flapmap operator #228