Configuring Virtual Thread executor in Jetty Http client makes my Spring Boot app freeze

martinvisser commented 1 week ago

Jetty version(s) 12.0.12

Jetty Environment n/a

Java version/vendor (use: java -version)

openjdk version "21.0.4" 2024-07-16 LTS
OpenJDK Runtime Environment Temurin-21.0.4+7 (build 21.0.4+7-LTS)
OpenJDK 64-Bit Server VM Temurin-21.0.4+7 (build 21.0.4+7-LTS, mixed mode, sharing)

OS type/version MacOS 14.6.1 & Ubuntu 22.04

Description We have a Spring Boot 3.3 application that uses GraphQL. That service is backed by several "hubs", each configured via a RestClient. RestClient is just an interface and can be used with several types of Http Client. The (near) default is Apache Http Client, but we noticed virtual thread pinning, so we are looking for alternatives. Spring has support, in order (see org.springframework.boot.web.client.ClientHttpRequestFactories), for Apache, Jetty, OkHttp 3 (deprecated), and Simple, so it made sense to look into Jetty. By using the defaults (only using JettyClientHttpRequestFactory()) the application works fine and keeps working fine, but the Jetty client uses a thread pool (using platform threads?) even though we would like to use virtual threads for the interaction with the hubs. By working fine I mean: the app returns a 200 OK on calling http://localhost:8080/actuator/health.

To use virtual threads, we configure the executor on the HttpClient:

        val httpClient = HttpClient()
        httpClient.executor = Executors.newVirtualThreadPerTaskExecutor()
        val requestFactory = JettyClientHttpRequestFactory(httpClient)

When starting the application after this, nothing special happens, but when making one call to the RestClient a bunch of ForkJoinPool worker threads (no. 1-6 in the screenshot) are created. After keeping the application for idle, the workers (no. 7 and 8 in the screenshot) created by the application close, but the ones created by Jetty stay, and the app gets "stuck". By stuck I mean: no response is returned on calling the actuator.

The different ForkJoinPools show different stack traces:

How to reproduce? I created a demo repo to show this behavior. The readme contains steps on how to run the app.

In short: after the app is started the RestClient bean is loaded and the ForkJoinPool worker threads are created. Depending on the amount of cores this starts a bunch for Jetty and 2 for the app (due to -Djdk.virtualThreadScheduler.parallelism=2). After keeping the app idle for about 60s (2 * idle timeout of 30s) those "app" worker threads close down, the other stay active. After those 60s the app is completely unresponsive.

This breaks:

    @Bean
    fun restClient(
        restClientBuilder: RestClient.Builder,
    ) = RestClient.builder()
        .requestFactory(createRequestFactory())
        .baseUrl("http://localhost:8080")
        .build()

    private fun createRequestFactory(): JettyClientHttpRequestFactory {
        val httpClient = HttpClient()
        httpClient.executor = Executors.newVirtualThreadPerTaskExecutor()
        return JettyClientHttpRequestFactory(httpClient)
    }

This works, but doesn't use virtual threads for the http client:

    @Bean
    fun restClient(
        restClientBuilder: RestClient.Builder,
    ) = RestClient.builder()
        .requestFactory(JettyClientHttpRequestFactory())
        .baseUrl("http://localhost:8080")
        .build()

This is of course when spring.threads.virtual.enabled is set to true to make the whole application run virtual threads.

sbordet commented 1 week ago

we would like to use virtual threads for the interaction with the hubs.

Why, just out of curiosity?

Please read the documentation about the virtual thread support: https://jetty.org/docs/jetty/12/programming-guide/arch/threads.html#thread-pool-virtual-threads and also: https://jetty.org/docs/jetty/12/operations-guide/modules/standard.html#threadpool-all-virtual

Long story short, the call to select() or in your case above KQueue.poll() pins the carrier, so you probably run out of carriers.

Setting -Djdk.virtualThreadScheduler.parallelism=2 probably exacerbates the problem (not sure exactly I get the difference between "Jetty" and "app" in your report).

I suggest you either don't use virtual threads (there should be no need, as the Jetty HttpClient is fully asynchronous).

If you really want to use virtual threads, please detail exactly where -Djdk.virtualThreadScheduler.parallelism=2 is set and what is affected by it, since you have a proxy that has both a client and a server.

Also, I recommend to not set the client executor to newVirtualThreadPerTaskExecutor(), but look into Jetty thread pools and their support for virtual threads.

martinvisser commented 1 week ago

Why, just out of curiosity?

Our app has quite a high throughput and a lot of I/O with the hubs, so we do not want that one request is going to block another. We came from a fully reactive app, but recent performance degradation made us take the decision to go for Java 21 with virtual threads, and no more Monos and Fluxes.

not sure exactly I get the difference between "Jetty" and "app" in your report

"Jetty" is the http client, the one that created ForkJoinPool workers that get pinned. "app" is the spring boot app, which only created two additional ForkJoinPool worker threads, both shutdown after idling.

I'll definitely have a look at those docs, but do they apply to the (standalone) http client as well? We are running the default server in Spring Boot, so that would be Tomcat. Just tried with Jetty as server, where this same issue occurs when setting parallelism. I don't see this freezing when using default parallelism with either server, so perhaps my way of reproducing the issue we faced is not the correct way. Our original issue was that the app froze after about 60s when idle. Our app was deployed on Cloud Foundry where we set no parallelism, and has 8 cores (not sure if that's relevant or not, but I created the demo app on a 16 core machine). I thought I found a neat way of reproducing it, but maybe this actually causes the freeze in the demo app.

I suggest you either don't use virtual threads (there should be no need, as the Jetty HttpClient is fully asynchronous).

But asynchronous is not necessarily non-blocking. Of course, a virtual thread could still be "blocked", but they don't require new platform threads if so, which does happen under high load.

Btw, the only reason I added the parallelism is to reduce the waiting time to show the issue, we do not do this in production. However, without setting this I do not see this freeze. I thought I found a neat way of reproducing it, but maybe this actually causes it in the demo app.

So maybe my way of reproducing is wrong, but I'm surprised that configuring the executor on the HttpClient can freeze the spring boot application from handling any requests. Perhaps an interesting side note: this freeze also occured when configured Netty as http client and explicitly configured it to run on the virtual thread executor. I'll see if I can create more heap and/or thread dumps to get more information.

sbordet commented 1 week ago

Our app has quite a high throughput and a lot of I/O with the hubs, so we do not want that one request is going to block another. We came from a fully reactive app, but recent performance degradation made us take the decision to go for Java 21 with virtual threads, and no more Monos and Fluxes.

Interesting, thanks for sharing.

I'll try the demo repo tomorrow.

If you can capture the JVM thread dump, including the virtual threads, will be great.

martinvisser commented 1 week ago

I changed the executor to use the QueuedThreadPool, but with a virtual thread factory. This seems to work well:

sbordet commented 1 week ago

I tried your repo, but seems to only have half of the application (the proxy).

Of course the virtual thread factory works, it pools the virtual threads it creates 😄

I believe your problem is the system property limiting the carriers, associated with the select() operation pinning the carriers.

If you configure Jetty's HttpClient with 1 selector only, you should be good.

If you can make your repository an auto-contained reproducer that we can try (with instructions), I'll gladly try to reproduce, but so far seems just misconfiguration.

Why do you want to restrict the carriers to only 2?

martinvisser commented 1 week ago

I don't want to restrict it, it made me reproduce the issue we noticed when we deployed our app to Cloud Foundry. But as said, maybe this actually creates an issue in my demo.

sbordet commented 1 week ago

Ok, so the basic knowledge here is to know that select() pins carriers.

By default, new HttpClient() or new HttpClient(new HttpClientTransportOverHTTP()) allocate cores/2 selectors. However, new HttpClient(new HttpClientTransportOverHTTP(new ClientConnector)) allocates only 1 selector.

We definitely need to align the number of selectors.

Having said that, the number of selectors and the number of carriers must be carefully chosen.

I'm not sure why you have a problem deploying to Cloud Foundry, unless you explicitly set the number of selectors, and they are more than the number of cores, or you explicitly set the number of carriers, and they are less than the selectors.

I would recommend you to use just 1 selector, unless you have a large number of concurrent connections (say larger than 5k), and that should work as long as you have more than 1 core.

If you have only 1 core, then you must configure jdk.virtualThreadScheduler.parallelism=p with p > 1.

martinvisser commented 1 week ago

On Cloud Foundry we have 8 cores, and have not configured anything for selectors. Per client we will potentially have hundreds of concurrent connections, but certainly not thousands of concurrent connections.

sbordet commented 1 week ago

@martinvisser do you have more than 1 HttpClient in the same JVM?

I ask because having two of them would result in 1 having 4 selectors by default, and the other having another 4, and this would pin all the carriers.

This could explain why you have issues deploying on Cloud Foundry: maybe you deploy 2 different applications, but both use Jetty's HttpClient.

If you have 100s of HTTP/1.1 connections, I suggest you explicitly configure the numbers of selectors to 1 in each HttpClient instance.

martinvisser commented 1 week ago

We have 9 different RestClients, each with its own HttpClient. If all indeed would even pin 1 thread it would soon run out of carrier threads. There's no way around the pinning part?

sbordet commented 1 week ago

There's no way around the pinning part?

No, unfortunately it is how virtual threads work.

Can you reuse the same HttpClient instance across the RestClients?

martinvisser commented 1 week ago

Theoretically yes, but each has its own max pool size, timeouts, etc. Some have ssl, some don't.

sbordet commented 1 week ago

I am afraid that if you need different configurations for each HttpClient you have to manually set jdk.virtualThreadScheduler.parallelism=N, with N carefully chosen.

Note that the "max pool size" should be irrelevant in your examples (you chose to be unlimited by using directly newVirtualThreadPerTaskExecutor()) -- unless you refer to some other pool.

The TLS configuration needs to be separated only if you use client certificates, and they must be isolated from each other. If you don't use client certificates, the TLS configuration can be the same, and if you perform clear-text http requests (not https), it won't be used.

Perhaps you can consolidate the timeouts into a single value?

If all of that is possible, then perhaps you can use a single HttpClient with 1 selector and all will be good.

martinvisser commented 1 week ago

The backends (hubs as I called them) are so different that consolidating them all into one HttpClient is not suitable for our needs. Anyway, thanks for the insights. I think it does explain the issue we faced. Maybe Jetty Http client (with virtual threads that is) will be an option for us in the future, but for now it's unfortunately not.

sbordet commented 1 week ago

Maybe Jetty Http client (with virtual threads that is) will be an option for us in the future, but for now it's unfortunately not.

Wait, this is true only for the "all virtual threads" solution you are trying.

If you use a Jetty QueuedThreadPool with a virtual thread Executor you won't have any of these problems. The selectors will be in a platform thread, and your client application will be invoked in a virtual thread. Best of both worlds.

You should totally be able to use Jetty's HttpClient in your scenario, with the right configuration.

var qtp = new QueuedThreadPool();
// Configure qtp.

var vtp = new VirtualThreadPool();
// Configure vtp.

qtp.setVirtualThreadExecutor(vtp);

// One selector only.
var httpClient = new HttpClient(new HttpClientTransportOverHTTP(1));
httpClient.setExecutor(qtp);

httpClient.start();

martinvisser commented 1 week ago

Okay, I'll give that an attempt (tomorrow)

gregw commented 6 days ago

Out of interessi, have you tried just blocking code (no monos and fluxes) with native threads?

How parallel does your app need to be? You can often have many 10s of thousands of native threads without a problem.

martinvisser commented 6 days ago

@gregw Our app (GraphQL) consumes information from multiple hubs in parallel. This information is combined to build the response. Due to the I/O we have a lot of waiting, and having platform threads doing that is pretty wasteful.

sbordet commented 6 days ago

@martinvisser Jetty's HttpClient would wait for I/O threadlessly, so it would be totally equivalent to a virtual thread in that sense.

Not sure whether wrappers of Jetty's HttpClient would block, but Jetty's HttpClient does not.

Assuming the wrappers provide a similar asynchronous non-blocking API, there should be no problems using platform threads.

We have multiple customers that deployed a proxy-like solution in this way, with great scalability exactly because Jetty's HttpClient waits for I/O threadlessly.

Did you manage to give QueuedThreadPool + VirtualThreadPool (as shown above) a go? Does it work for you?

martinvisser commented 5 days ago

@sbordet I tried this and yes, it works, but not really as I expect though. I expected to see and indeed saw one scheduler/selector per http client, but what I didn't expect to see is QueuedThreadPool platform threads to grow based on the load. What I expected to happen is that there would be no need for that to grow due to the virtualTaskExecutor handling all threads. But perhaps my assumption is wrong.

sbordet commented 5 days ago

@martinvisser we need details regarding your last comment.

How much did they grow?

The scheduler and the selectors are two different components that are not related to each other.

Can you take an HttpClient.dump() and report it here?

Enabling JMX would be the simpler way to take a HttpClient.dump(), see https://jetty.org/docs/jetty/12/programming-guide/arch/jmx.html

Alternatively, you can use a special request path (temporarily) that produces the dump() string as response.

We are trying to be as friendly as possible towards virtual threads, so we're keen to collaborate with different use cases, because it seems that virtual threads bring to the table one unexpected surprise after another.

martinvisser commented 5 days ago

Per backend it starts out with 8 or so (not many at least) threads. At some point in our test we're reaching a breakpoint, which is kind of the goal of the test, but when that happens there is a steep increase in QueuedThreadPool threads.

This picture includes the increase for 9 http clients, so the steepness is a bit unfair and is a factor 9 then.

The problem with this is that the application basically goes out of memory and Cloud Foundry kills the app. This means the container is gone and I can no longer generate a dump. I also have no option to increase memory due to constraints.

sbordet commented 5 days ago

@martinvisser unfortunately that graph is not enough, as we don't know what is actually creating those threads -- could be some other component, as I understand this is the thread report for the whole JVM, not specifically for Jetty components.

Please set QueuedThreadPool.setDetailedDump(true), and take both a Server.dump() and HttpClient.dump() and we will know what the threads are doing and likely why they are increasing.

gregw commented 2 days ago

Getting a dump during that cascade failure would be good.... just from the point of view that we like to gracefully fail, so if that is Jetty's fault, then we'd like to see what can be done to stop that. Can you put a maxThreads limit on the QTP that is below the out of memory limit, so that you hit a plateau (probably non functional) long enough to get a dump? Alternately, how is cloud foundry killing the app? If it is done cleanly, you can set a dump on stop.

jetty / jetty.project

Configuring Virtual Thread executor in Jetty Http client makes my Spring Boot app freeze #12256