Closed bowbahdoe closed 3 years ago
I took the liberty of renaming this issue so that it can be used as an umbrella for experiments about Jetty and Loom.
I will be looking at Loom and Jetty in the following days, so I will be able to be more precise about the answer.
/cc @gregw @lorban
I'm working on Project Loom. If you run into any questions, or issues with the early access builds, then you are welcome to bring them to the OpenJDK loom-dev mailing lists.
As it happens, I did create a demo that embeds Jetty and it was very easy to get started. I created a org.eclipse.jetty.util.thread.ThreadPoo
with execute
implemented to run the task in a virtual thread. Most things "just worked". I was able to create services that aggregated the results from other services, essentially fan-out using the JAX-RS client API (javax.ws.rs) where it didn't matter if the service spent most of the time blocked waiting for other services.
For our experiments with Loom, these are the questions that I'd really like answered:
State of Loom provides a good overview/status of Project Loom. There is a section on pinning that provides an overview of the short term limitations with respect to parking while holding a monitor. Fairness is not changed in the current prototype. Loom doesn't use "cooperative multithreading" (there are no explicit scheduling points).
@AlanBateman interesting read... But I'll have to go over it a few more times to fully digest.
Loom might not be strictly "cooperative multi threading", but as @sbordet is currently preparing a monster PR to replace many synchronize blocks with Locks so as to give Loom opportunity to "preempt", it does kind of feel like we are making explicit scheduling points... or at least have to be aware of what are scheduling points.
I am definitely concerned with pinning as we are just not in control of what applications will do. Consider a HTTP2 server, where the flow control is done in user space. If an application writes to a response from within a synchronised block, then that thread could become pinned if the write blocks because the flow control window is entirely consumed. Then if the frame that would open that flow control window is handled by a virtual thread, it may never get executed because all the real cores are attached to pinned virtual threads. We currently optimise the scheduling of this situations by using reserved threads, if we know a thread is available to continue handling flow control, then the current thread can continue from parsing a frame to handling that frame... with a hot cache. Not sure how to handle this with Loom? Perhaps we would need to have a couple of real threads always doing the IO selection and handling of control frames and then passing off to virtual threads for application handling??? But then we will always run the applications with cold CPU caches. Hmmmm
So it will be interesting to get Jetty running in Loom and test it with a load in a way to see if we get such problems.
However, ultimately I doubt that a server that has been so specifically optimised for running async IO on OS Threads is going to be the best usage of Loom. A more interesting approach would be to use the core infrastructure of jetty to assemble a non-async server that uses/assumes Loom. Ie if we have 10,000 HTTP connections each with a 100 Streams, then we just allocate 1,000,000 virtual threads and don't bother with all the async complexities that we go on with. Hmmm or would we allocation 10,000 threads, one for each connection, that would just run the HTTP2 protocol and then 1,000,000 threads that each ran the application/session. Each connection processing thread would then hand off work to one of 100 application/session threads.... and we'd have to be clever to try to get that executing on the same real thread so the cache would be hot...... and we'd still need to solve the pinning issue... but maybe 1 work stealing real thread could cover that.
So yep, I think it will be interesting for us to replace our synchronized with Locks, add a different Thread "pool" and see how it goes. However, ultimately I think we'd only really be fair on Loom if we wrote a new connector type that wasn't intrinsically async.... this would not be too hard to do, but we still have the issue that the input/output streams we give to the applications are implemented as async under the hood, so applications wouldn't really be using Loom preemption on IO. So to remove the async assumption from HttpChannel/HttpInput/HttpOutput is a fair bit of work.... but ultimately if we really want to know if the Loom approach really is scalable, then somebody needs to write a server that fully embraces the approach.
@gregw My understanding is that while locks and platform IO are considered "logical scheduling points", they aren't required for another virtual thread to preempt and they can be interrupted just like normal threads. I'm not entirely sure on that though and would have trouble proving it, since I can't come up with a case that would show it.
I think that is the purpose of the tryPreempt
method on java.lang.Continuation
though, so a scheduler is able to preempt without an explicit IO boundary.
My instinct with the concerns about reserved threads and how Jetty currently does scheduling is that if those concerns do end up being valid, a new scheduler roughly matching Jetty's current semantics could be written and used in place of ForkJoinPool.
Running with the system property jdk.tracePinnedThreads
set on the command line will help identify cases where a thread parks will holding a monitor. The intention is remove the limitation in time.
@bowbahdoe Ignore the Continuation and tryPreempt for now. Yes, there is support at the lower level for forced preemption but this is not exposed to custom schedulers at this time.
More pondering on what we'd need to change to make best usage of Loom. I no longer think we need to change HttpChannel
, HttpInput
and HttpOutput
as the servlet API requires async behavior and unless we want to give up on that API, modelling blocking as async is the best way to go for that level API.
However, we probably could experiment with writing a loom specific Connector
that avoids the SelectorManager
and all the async behaviour at that level. For HTTP1, the connector would just have a Loom virtual thread for every connection blocked in a read and running the HttpParser
in not blocking mode, passing events to HttpChannel
and calling a handle normally, which could eventually invoke the servlet.
For HTTP2, it would probably still be a Loom virtual thread per connection, but as there are multiple streams we would have to examine how that virtual thread executed tasks for each frame so that it efficiently handed them over to another Loom virtual thread. Ideally we probably need to specialize the Loom scheduluers and our executor so that if possible the same real thread with a hot cache would go on to run the frame task and call the servlet.... but we'd need to come up with a mechanism to avoid letting the last real thread be dispatched into the servlet container... where it could be pinned and we'd be screwed. But I think we already have all the info we need on our tasks regarding if they can or will block, so we probably have the ability to write a Loom scheduler to actually implement Eat-What-You-Kill as its core strategy.
So replacing our synchronizes and thread pool should allow Loom to run OK, but I think we really need to consider next steps to really give it a fair go.
If it helps, here's the stack trace of a simple service that fetches a resource from another endpoint. It's running on a virtual thread so the blocking operation, to establish the TCP connect to the remote service, just parks the virtual thread (and releasing the underlying carrier thread to do other work).
at java.base/java.lang.VirtualThread.doPark(VirtualThread.java:453)
at java.base/java.lang.VirtualThread.tryPark(VirtualThread.java:445)
at java.base/java.lang.VirtualThread.park(VirtualThread.java:408)
at java.base/java.lang.System$2.parkVirtualThread(System.java:2321)
at java.base/jdk.internal.misc.VirtualThreads.park(VirtualThreads.java:56)
at java.base/sun.nio.ch.NioSocketImpl.park(NioSocketImpl.java:182)
at java.base/sun.nio.ch.NioSocketImpl.park(NioSocketImpl.java:211)
at java.base/sun.nio.ch.NioSocketImpl.connect(NioSocketImpl.java:603)
at java.base/java.net.Socket.connect(Socket.java:648)
at java.base/sun.net.NetworkClient.doConnect(NetworkClient.java:177)
at java.base/sun.net.www.http.HttpClient.openServer(HttpClient.java:514)
at java.base/sun.net.www.http.HttpClient.lockedOpenServer(HttpClient.java:626)
at java.base/sun.net.www.http.HttpClient.openServer(HttpClient.java:596)
at java.base/sun.net.www.http.HttpClient.<init>(HttpClient.java:256)
at java.base/sun.net.www.http.HttpClient.New(HttpClient.java:361)
at java.base/sun.net.www.http.HttpClient.New(HttpClient.java:382)
at java.base/sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1288)
at java.base/sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1221)
at java.base/sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1109)
at java.base/sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:1040)
at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1649)
at java.base/sun.net.www.protocol.http.HttpURLConnection.lockedGetInputStream(HttpURLConnection.java:1577)
at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1553)
at java.base/java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:527)
at org.glassfish.jersey.client.HttpUrlConnector._apply(HttpUrlConnector.java:321)
at org.glassfish.jersey.client.HttpUrlConnector.apply(HttpUrlConnector.java:227)
at org.glassfish.jersey.client.ClientRuntime.invoke(ClientRuntime.java:225)
at org.glassfish.jersey.client.JerseyInvocation$2.call(JerseyInvocation.java:671)
at org.glassfish.jersey.internal.Errors.process(Errors.java:315)
at org.glassfish.jersey.internal.Errors.process(Errors.java:297)
at org.glassfish.jersey.internal.Errors.process(Errors.java:228)
at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:424)
at org.glassfish.jersey.client.JerseyInvocation.invoke(JerseyInvocation.java:667)
at org.glassfish.jersey.client.JerseyInvocation$Builder.method(JerseyInvocation.java:396)
at org.glassfish.jersey.client.JerseyInvocation$Builder.get(JerseyInvocation.java:296)
at demo.AggregatorServices.query(AggregatorServices.java:93)
at demo.AggregatorServices.anyOf(AggregatorServices.java:44)
at java.base/jdk.internal.reflect.NewAccessorImplFactory$1.invoke(NewAccessorImplFactory.java:83)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:75)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:564)
at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory$1.invoke(ResourceMethodInvocationHandlerFactory.java:81)
at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:151)
at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:171)
at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$TypeOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:195)
at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:104)
at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:406)
at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:350)
at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:106)
at org.glassfish.jersey.server.ServerRuntime$1.run(ServerRuntime.java:259)
at org.glassfish.jersey.internal.Errors$1.call(Errors.java:271)
at org.glassfish.jersey.internal.Errors$1.call(Errors.java:267)
at org.glassfish.jersey.internal.Errors.process(Errors.java:315)
at org.glassfish.jersey.internal.Errors.process(Errors.java:297)
at org.glassfish.jersey.internal.Errors.process(Errors.java:267)
at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:320)
at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:236)
at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:1028)
at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:373)
at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:381)
at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:344)
at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:219)
at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:763)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:551)
at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1610)
at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1363)
at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:489)
at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1580)
at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1278)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
at org.eclipse.jetty.server.Server.handle(Server.java:500)
at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:383)
at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:547)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:375)
at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:273)
at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)
at java.base/java.lang.VirtualThread.lambda$new$0(VirtualThread.java:134)
at java.base/java.lang.Continuation.enter0(Continuation.java:394)
at java.base/java.lang.Continuation.enter(Continuation.java:387)
Disclaimer: These are not benchmarks. (but ...)
I was curious to see what a simple ThreadPool change would do. See https://github.com/jetty-project/jetty-loom/blob/master/src/main/java/org/eclipse/jetty/loom/LoomThreadPool.java
Code at https://github.com/jetty-project/jetty-loom
The results:
With Loom
$ ab -n 100000 -c 10000 http://localhost:8888/
This is ApacheBench, Version 2.3 <$Revision: 1807734 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking localhost (be patient)
Completed 10000 requests
Completed 20000 requests
Completed 30000 requests
Completed 40000 requests
Completed 50000 requests
Completed 60000 requests
Completed 70000 requests
Completed 80000 requests
Completed 90000 requests
Completed 100000 requests
Finished 100000 requests
Server Software: Jetty(Loom)-10.0.0-SNAPSHOT
Server Hostname: localhost
Server Port: 8888
Document Path: /
Document Length: 7 bytes
Concurrency Level: 10000
Time taken for tests: 5.701 seconds
Complete requests: 100000
Failed requests: 0
Total transferred: 15900000 bytes
HTML transferred: 700000 bytes
Requests per second: 17541.33 [#/sec] (mean)
Time per request: 570.082 [ms] (mean)
Time per request: 0.057 [ms] (mean, across all concurrent requests)
Transfer rate: 2723.70 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 81 283 339.9 210 3185
Processing: 54 250 103.3 244 3660
Waiting: 35 142 111.9 113 3640
Total: 260 532 369.4 483 4720
Percentage of the requests served within a certain time (ms)
50% 483
66% 529
75% 540
80% 553
90% 584
95% 1316
98% 1494
99% 2139
100% 4720 (longest request)
Without Loom (Using QTP)
$ ab -n 100000 -c 10000 http://localhost:8888/
This is ApacheBench, Version 2.3 <$Revision: 1807734 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking localhost (be patient)
Completed 10000 requests
Completed 20000 requests
Completed 30000 requests
Completed 40000 requests
Completed 50000 requests
Completed 60000 requests
Completed 70000 requests
Completed 80000 requests
Completed 90000 requests
Completed 100000 requests
Finished 100000 requests
Server Software: Jetty(10.0.0-SNAPSHOT)
Server Hostname: localhost
Server Port: 8888
Document Path: /
Document Length: 7 bytes
Concurrency Level: 10000
Time taken for tests: 5.869 seconds
Complete requests: 100000
Failed requests: 0
Total transferred: 15400000 bytes
HTML transferred: 700000 bytes
Requests per second: 17040.12 [#/sec] (mean)
Time per request: 586.850 [ms] (mean)
Time per request: 0.059 [ms] (mean, across all concurrent requests)
Transfer rate: 2562.67 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 309 621.1 155 3297
Processing: 8 153 72.9 145 364
Waiting: 5 107 59.9 102 342
Total: 31 462 638.7 378 3486
Percentage of the requests served within a certain time (ms)
50% 378
66% 407
75% 419
80% 430
90% 450
95% 1237
98% 3413
99% 3463
100% 3486 (longest request)
@joakime Can you run that test but with the 0s replaced by Integer.MAX_VALUE just to see how/if that affects things?
@bowbahdoe ~it becomes unstable~. (if I run the same ab
, instead of an unstable command line, then it doesn't seem to show much change)
Commit https://github.com/jetty-project/jetty-loom/commit/4710f66ce46a22a9a691b311afa48f1d773d98a2
Results with Loom (edit: now using same ab command line as before)
$ ab -n 100000 -c 10000 http://localhost:8888/
This is ApacheBench, Version 2.3 <$Revision: 1807734 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking localhost (be patient)
Completed 10000 requests
Completed 20000 requests
Completed 30000 requests
Completed 40000 requests
Completed 50000 requests
Completed 60000 requests
Completed 70000 requests
Completed 80000 requests
Completed 90000 requests
Completed 100000 requests
Finished 100000 requests
Server Software: Jetty(Loom)-10.0.0-SNAPSHOT
Server Hostname: localhost
Server Port: 8888
Document Path: /
Document Length: 7 bytes
Concurrency Level: 10000
Time taken for tests: 28.976 seconds
Complete requests: 100000
Failed requests: 0
Total transferred: 15900000 bytes
HTML transferred: 700000 bytes
Requests per second: 3451.08 [#/sec] (mean)
Time per request: 2897.646 [ms] (mean)
Time per request: 0.290 [ms] (mean, across all concurrent requests)
Transfer rate: 535.86 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 280 587.0 96 3208
Processing: 3 159 728.4 105 27840
Waiting: 2 127 730.0 73 27840
Total: 3 439 999.6 207 28930
Percentage of the requests served within a certain time (ms)
50% 207
66% 260
75% 290
80% 301
90% 1194
95% 1359
98% 3383
99% 3403
100% 28930 (longest request)
I am also interested in this topic, I see above some experiments with custom connectors were started. @sbordet @gregw Any results from this? Is it possible to solve c10k problem with fibers based connector?
@lukago the c10k problem is solved since long time, see https://webtide.com/do-looms-claims-stack-up-part-1/ where an untuned laptop can do 32k threads.
We would love to hear what your use case! Do you have a case where you need a single server to handle more than 10k concurrent threads? Or a case where you want to handle more than 10k connections with a thread-per-connection model? Thanks!
@sbordet What i mean is if we can achieve with jetty loom something like in this example based on netty: https://github.com/Jotschi/vertx-c10k-example
I runned similar tests (-c > 1000) for jetty with fiber based thread pool based on this example: https://github.com/tipsy/loomylin (javalin is based on jetty)
$ wrk -c1000 -d10s -t10 http://localhost:7002/computational
Running 10s test @ http://localhost:7002/computational
10 threads and 1000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 167.36ms 179.30ms 1.98s 95.49%
Req/Sec 26.35 19.54 150.00 72.75%
2082 requests in 10.10s, 266.35KB read
Socket errors: connect 759, read 687, write 0, timeout 0
Requests/sec: 206.23
Transfer/sec: 26.38KB
Despite using fiber thread pool there is 759 socket errors. I guess this happen because in this example jetty connector does not utiize fiber threads and declines new connections after reaching some critical point of ~250 connections.
I see you have done some tweaks in connectors here: https://github.com/eclipse/jetty.project/compare/jetty-10.0.x-loom So my question is does it solve problem above? If not what needs to be done to make it work?
Also i wonder if tuning jetty connectors for fibers will get its performance closer to what we see in netty example where for 10k concurrent connections it handles 7k req/s.
@lukago I have pushed Jetty+CometD to 400k connections a few years ago (I think it was Jetty 8) so c10k is not a problem since many years. We have clients in production that have > 100k connections on a single server, running easily. Both use async APIs on the server.
I have not run wrk
or javalin
but typically it's the client that can't cope with the load that itself generate, so I would take with a grain of salt any result positive or not of benchmarks that fail almost immediately for an almost empty load like the one you report above.
I just run the CometD benchmark with 10k connections at ~45k requests/s easily on my laptop:
========================================
Monitoring Started at Mon Jan 04 15:26:35 CET 2021
Operative System: Linux 5.8.0-33-generic amd64
JVM: AdoptOpenJDK OpenJDK 64-Bit Server VM 15.0.1+9 15.0.1+9
Processors: 12
System Memory: 77.612076% used of 31.164349 GiB
Used Heap Size: 543.07166 MiB
Max Heap Size: 12288.0 MiB
Young Generation Heap Size: 0.0 MiB
- - - - - - - - - - - - - - - - - - - -
Testing 10000 clients in 100 rooms, 10 rooms/client
Sending 1000 batches of 1x50 bytes messages every 10000 µs
[2021-01-04T15:26:39.504+0100][info][gc] GC(13) Pause Young (Normal) (G1 Evacuation Pause) 4095M->606M(12288M) 35,085ms
[2021-01-04T15:26:44.279+0100][info][gc] GC(14) Pause Young (Normal) (G1 Evacuation Pause) 4038M->621M(12288M) 38,708ms
- - - - - - - - - - - - - - - - - - - -
Monitoring Ended at Mon Jan 04 15:26:45 CET 2021
Elapsed Time: 10002 ms
Time in JIT Compilation: 118 ms
Time in Young GC: 74 ms (2 collections)
Time in Old GC: 0 ms (0 collections)
Garbage Generated in Eden Space: 7280.0 MiB
Garbage Generated in Survivor Space: 102.22278 MiB
Garbage Generated in Tenured Space: 0.0 MiB
Average CPU Load: 524.17084/1200
========================================
Waiting for messages to arrive 988726/1000384
All messages arrived 1000384/1000384
Messages - Success/Expected = 1000384/1000384
Outgoing: Elapsed = 10000 ms | Rate = 990 messages/s - 99 batches/s - 16.579 MiB/s
Incoming - Elapsed = 10239 ms | Rate = 97697 messages/s - 44526 batches/s(45.58%) - 28.792 MiB/s
@ _ 18,630 µs (137557, 13.75%)
@ _ 37,261 µs (82618, 8.26%)
@ _ 55,891 µs (80937, 8.09%)
@ _ 74,522 µs (82597, 8.26%)
@ _ 93,153 µs (80813, 8.08%)
@ _ 111,783 µs (82681, 8.26%) ^50%
@ _ 130,414 µs (82256, 8.22%)
@ _ 149,045 µs (81573, 8.15%)
@ _ 167,675 µs (79632, 7.96%)
@ _ 186,306 µs (76462, 7.64%) ^85%
@ _ 204,936 µs (51231, 5.12%)
@ _ 223,567 µs (45370, 4.54%) ^95%
@ _ 242,198 µs (20155, 2.01%)
@ _ 260,828 µs (5353, 0.54%)
@ _ 279,459 µs (4013, 0.40%) ^99%
@ _ 298,090 µs (2135, 0.21%)
@ _ 316,720 µs (1848, 0.18%)
@ _ 335,351 µs (1428, 0.14%)
@ _ 353,981 µs (1102, 0.11%) ^99.9%
@ _ 372,612 µs (623, 0.06%)
Messages - Latency: 1000384 samples | min/avg/50th%/99th%/max = 156/104,136/101,253/265,420/372,768 µs
Messages - Network Latency Min/Ave/Max = 0/103/370 ms
Slowest Message ID = 11490/bench/a time = 372 ms
Thread Pool:
threads: 219
tasks: 337244
max concurrent threads: 179
max queue size: 177
queue latency avg/max: 0/39 ms
task time avg/max: 3/20179 ms
-----
So, c10k is not a problem, provided you are async on the server.
@lukago I doubt we will ever use Loom within jetty for connectors. Jetty is already fully async internally and we have to do a lot of clever things to prevent head-of-line blocking and even dead locks if important tasks like HTTP/2 flow control get deferred. Virtual threads can easily be deferred, so we just don't think they are suitable (I'll say yet... but I dubious they eve will be).
However, using Loom virtual threads to dispatch to an application that is written in blocking mode is something that we have already implemented in our test branch and is something that is very likely to reach a main branch if Loom ever makes it to a released JVM. That will allows many thousands of virtual threads to block in the application. That could be 10s or 100s or 1000s of thousands, depending on how many other resources the application uses.
I'm not yet convinced this will give as good as results as writing async applications, but it should be in the ball park and it will definitely be much easier to write and maintain.
@sbordet what do you mean by async APIs, do you have any example? I dont have any knowledge about cometD but using Async Servlet seems to be not enough as I still get only ~250 max concurrent connections. I am mostly interested in sync applications but it will be good to know how to configure it for async apps too.
@gregw so if i run jetty from your test branch with config org.eclipse.jetty.io.use_loom=true
then this socket errors should not occur anymore or any additional config for max number of concurrent connections is needed?
Edit: Ok, now i get it, the problem with connections was indeed on client side, i fixed it with ulimit
command
Thanks! :) #
@lukago Just a general note that might be useful for spectators to the discussion even if its not what you were asking exactly.
An "async api" is any api that will not deliver its result immediately, such as later calling a callback or by returning an object that callbacks can be attached to.
Async Apis:
// Async api 1
// Will call the callback "later" maybe on a different thread
void getInteger(Consumer<Integer> whenDone) { ... }
// Usage
getInteger(x -> System.out.println(x + 1));
// Async api 2
// Will return an object that results can be chained on
Future<Integer> getInteger() { ... }
// Usage
getInteger()
.then(x -> x + 1)
.then(x -> System.out.println(x));
Synchronous Apis:
// Returns when result available
Integer getInteger() { ... }
// Usage
int x = getInteger();
x = x + 1;
System.out.println(x);
If most of your code is written using synchronous apis there won't be much or any performance benefit to using async servlets simply because there won't be "explicit yield points" that can be taken advantage of. The "seams" added by the callbacks or the Futures are what is used to "juggle" tasks between OS threads.
@gregw I'd like to aske one more thing about jetty I/O model. As I understand current version of Jetty use similar model to Netty for handling I/O. So there is a separate thread pool where each thread (aka event loop) is asking kernel for new I/O events and then dispatching it to another thread pool for blocking servlets or doing evertyhing on I/O pool for async servlets. What If we change I/O thread pool to be fiber based as well? Will it be beneficial for overall performance?
Lukasz,
I'm not familiar with Netty's internals, but our scheduler is not like you describe.
Having a selector thread that always dispatches to a thread pool means that there is always extra latency and often you will end up with a cold CPU cache and get parallel slowdown (see https://webtide.com/avoiding-parallel-slowdown-in-jetty-9/)
If you never dispatch, then you end up with deadlock and/or head-of-line blocking.
So we have a more adaptive strategy called "Eat what you Kill". Details here: https://webtide.com/eat-what-you-kill/
Currently I cannot see how that can be implemented with virtual threads in a beneficial way, however the strategy is adaptive enough to use virtual threads when appropriate, as we have done with our Loom branch.
regards
On Wed, 6 Jan 2021 at 14:16, Łukasz Gołębiewski notifications@github.com wrote:
@gregw https://github.com/gregw I'd like to aske one more thing about jetty I/O model. As I understand current version of Jetty use similar model to Netty for handling I/O. So there is a separate thread pool where each thread (aka event loop) is asking kernel for new I/O events and then dispatching it to separate thread pool for blocking servlets or doing evertyhing on I/O pool for async servlets. What If we change I/O thread pool to be fiber based as well? Will it be beneficial for overall performance?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/eclipse/jetty.project/issues/5078#issuecomment-755293079, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAARJLIIPT7HR2KEXILUQODSYRPCLANCNFSM4PHJKJFA .
-- Greg Wilkins gregw@webtide.com CTO http://webtide.com
Try now as it looks complete to me.
On Tue, 11 Apr 2023 at 14:12, xodiumluma @.***> wrote:
Lukasz, I'm not familiar with Netty's internals, but our scheduler is not like you describe. Having a selector thread that always dispatches to a thread pool means that there is always extra latency and often you will end up with a cold CPU cache and get parallel slowdown (see https://webtide.com/avoiding-parallel-slowdown-in-jetty-9/) If you never dispatch, then you end up with deadlock and/or head-of-line blocking. So we have a more adaptive strategy called "Eat what you Kill". Details here: https://webtide.com/eat-what-you-kill/ Currently I cannot see how that can be implemented with virtual threads in a beneficial way, however the strategy is adaptive enough to use virtual threads when appropriate, as we have done with our Loom branch. regards … <#m1784099298914155251> On Wed, 6 Jan 2021 at 14:16, Łukasz Gołębiewski @.> wrote: @gregw https://github.com/gregw https://github.com/gregw I'd like to aske one more thing about jetty I/O model. As I understand current version of Jetty use similar model to Netty for handling I/O. So there is a separate thread pool where each thread (aka event loop) is asking kernel for new I/O events and then dispatching it to separate thread pool for blocking servlets or doing evertyhing on I/O pool for async servlets. What If we change I/O thread pool to be fiber based as well? Will it be beneficial for overall performance? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#5078 (comment) https://github.com/eclipse/jetty.project/issues/5078#issuecomment-755293079>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAARJLIIPT7HR2KEXILUQODSYRPCLANCNFSM4PHJKJFA . -- Greg Wilkins @. CTO http://webtide.com
Hi @gregw https://github.com/gregw,
Hope you are well.
The "Eat what you kill" page https://webtide.com/eat-what-you-kill/ is chopped off at the bottom. Could your team please reinstate the missing content? It currently reads, "If the request is consumed by a different thread, then all the request data must be loaded into the new CPU c" <- there it ends.
Thanks!
— Reply to this email directly, view it on GitHub https://github.com/eclipse/jetty.project/issues/5078#issuecomment-1503218529, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAARJLOKKK74QJUIJMMIMYTXAVDC5ANCNFSM4PHJKJFA . You are receiving this because you were mentioned.Message ID: @.***>
-- Greg Wilkins @.***> CTO http://webtide.com
Thanks @gregw, it started working about twenty minutes after I posted, so I deleted the post.
@wendal yes it works... unless you are saying otherwise?
Jetty version 10.0.0-SNAPSHOT Java version Project Loom Pre-release JDK Build Question I am experimenting with the project loom pre release builds and I am trying to figure out how to properly configure Jetty to make use of virtual threads.
Quite a bit of the code seems centered around thread pooling and managing capacity, but that isn't quite as applicable to virtual threads. I figure I could change "max threads" up to a really high number, but there is still logic for checking the capacity of a thread pool - even if backed by a
Executors.newUnboundedVirtualThreadExecutor()
- which I am thinking would be wasteful in that context.I guess this is partly a "Jetty Architecture" question more than anything else - I'm just looking for some pointers on where to start with the codebase to make an eventual upgrade work.