jetty / jetty.project

Eclipse Jetty® - Web Container & Clients - supports HTTP/2, HTTP/1.1, HTTP/1.0, websocket, servlets, and more
https://eclipse.dev/jetty
Other
3.86k stars 1.91k forks source link

Jetty and Loom #5078

Closed bowbahdoe closed 3 years ago

bowbahdoe commented 4 years ago

Jetty version 10.0.0-SNAPSHOT Java version Project Loom Pre-release JDK Build Question I am experimenting with the project loom pre release builds and I am trying to figure out how to properly configure Jetty to make use of virtual threads.

Quite a bit of the code seems centered around thread pooling and managing capacity, but that isn't quite as applicable to virtual threads. I figure I could change "max threads" up to a really high number, but there is still logic for checking the capacity of a thread pool - even if backed by a Executors.newUnboundedVirtualThreadExecutor() - which I am thinking would be wasteful in that context.

I guess this is partly a "Jetty Architecture" question more than anything else - I'm just looking for some pointers on where to start with the codebase to make an eventual upgrade work.

sbordet commented 4 years ago

I took the liberty of renaming this issue so that it can be used as an umbrella for experiments about Jetty and Loom.

I will be looking at Loom and Jetty in the following days, so I will be able to be more precise about the answer.

/cc @gregw @lorban

AlanBateman commented 4 years ago

I'm working on Project Loom. If you run into any questions, or issues with the early access builds, then you are welcome to bring them to the OpenJDK loom-dev mailing lists.

As it happens, I did create a demo that embeds Jetty and it was very easy to get started. I created a org.eclipse.jetty.util.thread.ThreadPoo with execute implemented to run the task in a virtual thread. Most things "just worked". I was able to create services that aggregated the results from other services, essentially fan-out using the JAX-RS client API (javax.ws.rs) where it didn't matter if the service spent most of the time blocked waiting for other services.

gregw commented 4 years ago

For our experiments with Loom, these are the questions that I'd really like answered:

AlanBateman commented 4 years ago

State of Loom provides a good overview/status of Project Loom. There is a section on pinning that provides an overview of the short term limitations with respect to parking while holding a monitor. Fairness is not changed in the current prototype. Loom doesn't use "cooperative multithreading" (there are no explicit scheduling points).

gregw commented 4 years ago

@AlanBateman interesting read... But I'll have to go over it a few more times to fully digest.
Loom might not be strictly "cooperative multi threading", but as @sbordet is currently preparing a monster PR to replace many synchronize blocks with Locks so as to give Loom opportunity to "preempt", it does kind of feel like we are making explicit scheduling points... or at least have to be aware of what are scheduling points.

I am definitely concerned with pinning as we are just not in control of what applications will do. Consider a HTTP2 server, where the flow control is done in user space. If an application writes to a response from within a synchronised block, then that thread could become pinned if the write blocks because the flow control window is entirely consumed. Then if the frame that would open that flow control window is handled by a virtual thread, it may never get executed because all the real cores are attached to pinned virtual threads. We currently optimise the scheduling of this situations by using reserved threads, if we know a thread is available to continue handling flow control, then the current thread can continue from parsing a frame to handling that frame... with a hot cache. Not sure how to handle this with Loom? Perhaps we would need to have a couple of real threads always doing the IO selection and handling of control frames and then passing off to virtual threads for application handling??? But then we will always run the applications with cold CPU caches. Hmmmm

So it will be interesting to get Jetty running in Loom and test it with a load in a way to see if we get such problems.

However, ultimately I doubt that a server that has been so specifically optimised for running async IO on OS Threads is going to be the best usage of Loom. A more interesting approach would be to use the core infrastructure of jetty to assemble a non-async server that uses/assumes Loom. Ie if we have 10,000 HTTP connections each with a 100 Streams, then we just allocate 1,000,000 virtual threads and don't bother with all the async complexities that we go on with. Hmmm or would we allocation 10,000 threads, one for each connection, that would just run the HTTP2 protocol and then 1,000,000 threads that each ran the application/session. Each connection processing thread would then hand off work to one of 100 application/session threads.... and we'd have to be clever to try to get that executing on the same real thread so the cache would be hot...... and we'd still need to solve the pinning issue... but maybe 1 work stealing real thread could cover that.

So yep, I think it will be interesting for us to replace our synchronized with Locks, add a different Thread "pool" and see how it goes. However, ultimately I think we'd only really be fair on Loom if we wrote a new connector type that wasn't intrinsically async.... this would not be too hard to do, but we still have the issue that the input/output streams we give to the applications are implemented as async under the hood, so applications wouldn't really be using Loom preemption on IO. So to remove the async assumption from HttpChannel/HttpInput/HttpOutput is a fair bit of work.... but ultimately if we really want to know if the Loom approach really is scalable, then somebody needs to write a server that fully embraces the approach.

bowbahdoe commented 4 years ago

@gregw My understanding is that while locks and platform IO are considered "logical scheduling points", they aren't required for another virtual thread to preempt and they can be interrupted just like normal threads. I'm not entirely sure on that though and would have trouble proving it, since I can't come up with a case that would show it.

I think that is the purpose of the tryPreempt method on java.lang.Continuation though, so a scheduler is able to preempt without an explicit IO boundary.

bowbahdoe commented 4 years ago

My instinct with the concerns about reserved threads and how Jetty currently does scheduling is that if those concerns do end up being valid, a new scheduler roughly matching Jetty's current semantics could be written and used in place of ForkJoinPool.

AlanBateman commented 4 years ago

Running with the system property jdk.tracePinnedThreads set on the command line will help identify cases where a thread parks will holding a monitor. The intention is remove the limitation in time.

@bowbahdoe Ignore the Continuation and tryPreempt for now. Yes, there is support at the lower level for forced preemption but this is not exposed to custom schedulers at this time.

gregw commented 4 years ago

More pondering on what we'd need to change to make best usage of Loom. I no longer think we need to change HttpChannel, HttpInput and HttpOutput as the servlet API requires async behavior and unless we want to give up on that API, modelling blocking as async is the best way to go for that level API.

However, we probably could experiment with writing a loom specific Connector that avoids the SelectorManager and all the async behaviour at that level. For HTTP1, the connector would just have a Loom virtual thread for every connection blocked in a read and running the HttpParser in not blocking mode, passing events to HttpChannel and calling a handle normally, which could eventually invoke the servlet.

For HTTP2, it would probably still be a Loom virtual thread per connection, but as there are multiple streams we would have to examine how that virtual thread executed tasks for each frame so that it efficiently handed them over to another Loom virtual thread. Ideally we probably need to specialize the Loom scheduluers and our executor so that if possible the same real thread with a hot cache would go on to run the frame task and call the servlet.... but we'd need to come up with a mechanism to avoid letting the last real thread be dispatched into the servlet container... where it could be pinned and we'd be screwed. But I think we already have all the info we need on our tasks regarding if they can or will block, so we probably have the ability to write a Loom scheduler to actually implement Eat-What-You-Kill as its core strategy.

So replacing our synchronizes and thread pool should allow Loom to run OK, but I think we really need to consider next steps to really give it a fair go.

AlanBateman commented 4 years ago

If it helps, here's the stack trace of a simple service that fetches a resource from another endpoint. It's running on a virtual thread so the blocking operation, to establish the TCP connect to the remote service, just parks the virtual thread (and releasing the underlying carrier thread to do other work).

        at java.base/java.lang.VirtualThread.doPark(VirtualThread.java:453)
    at java.base/java.lang.VirtualThread.tryPark(VirtualThread.java:445)
    at java.base/java.lang.VirtualThread.park(VirtualThread.java:408)
    at java.base/java.lang.System$2.parkVirtualThread(System.java:2321)
    at java.base/jdk.internal.misc.VirtualThreads.park(VirtualThreads.java:56)
    at java.base/sun.nio.ch.NioSocketImpl.park(NioSocketImpl.java:182)
    at java.base/sun.nio.ch.NioSocketImpl.park(NioSocketImpl.java:211)
    at java.base/sun.nio.ch.NioSocketImpl.connect(NioSocketImpl.java:603)
    at java.base/java.net.Socket.connect(Socket.java:648)
    at java.base/sun.net.NetworkClient.doConnect(NetworkClient.java:177)
    at java.base/sun.net.www.http.HttpClient.openServer(HttpClient.java:514)
    at java.base/sun.net.www.http.HttpClient.lockedOpenServer(HttpClient.java:626)
    at java.base/sun.net.www.http.HttpClient.openServer(HttpClient.java:596)
    at java.base/sun.net.www.http.HttpClient.<init>(HttpClient.java:256)
    at java.base/sun.net.www.http.HttpClient.New(HttpClient.java:361)
    at java.base/sun.net.www.http.HttpClient.New(HttpClient.java:382)
    at java.base/sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1288)
    at java.base/sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1221)
    at java.base/sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1109)
    at java.base/sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:1040)
    at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1649)
    at java.base/sun.net.www.protocol.http.HttpURLConnection.lockedGetInputStream(HttpURLConnection.java:1577)
    at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1553)
    at java.base/java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:527)
    at org.glassfish.jersey.client.HttpUrlConnector._apply(HttpUrlConnector.java:321)
    at org.glassfish.jersey.client.HttpUrlConnector.apply(HttpUrlConnector.java:227)
    at org.glassfish.jersey.client.ClientRuntime.invoke(ClientRuntime.java:225)
    at org.glassfish.jersey.client.JerseyInvocation$2.call(JerseyInvocation.java:671)
    at org.glassfish.jersey.internal.Errors.process(Errors.java:315)
    at org.glassfish.jersey.internal.Errors.process(Errors.java:297)
    at org.glassfish.jersey.internal.Errors.process(Errors.java:228)
    at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:424)
    at org.glassfish.jersey.client.JerseyInvocation.invoke(JerseyInvocation.java:667)
    at org.glassfish.jersey.client.JerseyInvocation$Builder.method(JerseyInvocation.java:396)
    at org.glassfish.jersey.client.JerseyInvocation$Builder.get(JerseyInvocation.java:296)
    at demo.AggregatorServices.query(AggregatorServices.java:93)
    at demo.AggregatorServices.anyOf(AggregatorServices.java:44)
    at java.base/jdk.internal.reflect.NewAccessorImplFactory$1.invoke(NewAccessorImplFactory.java:83)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:75)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:564)
    at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory$1.invoke(ResourceMethodInvocationHandlerFactory.java:81)
    at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:151)
    at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:171)
    at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$TypeOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:195)
    at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:104)
    at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:406)
    at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:350)
    at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:106)
    at org.glassfish.jersey.server.ServerRuntime$1.run(ServerRuntime.java:259)
    at org.glassfish.jersey.internal.Errors$1.call(Errors.java:271)
    at org.glassfish.jersey.internal.Errors$1.call(Errors.java:267)
    at org.glassfish.jersey.internal.Errors.process(Errors.java:315)
    at org.glassfish.jersey.internal.Errors.process(Errors.java:297)
    at org.glassfish.jersey.internal.Errors.process(Errors.java:267)
    at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:320)
    at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:236)
    at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:1028)
    at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:373)
    at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:381)
    at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:344)
    at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:219)
    at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:763)
    at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:551)
    at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
    at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1610)
    at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
    at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1363)
    at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
    at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:489)
    at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1580)
    at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
    at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1278)
    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
    at org.eclipse.jetty.server.Server.handle(Server.java:500)
    at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:383)
    at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:547)
    at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:375)
    at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:273)
    at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
    at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
    at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)
    at java.base/java.lang.VirtualThread.lambda$new$0(VirtualThread.java:134)
    at java.base/java.lang.Continuation.enter0(Continuation.java:394)
    at java.base/java.lang.Continuation.enter(Continuation.java:387)
joakime commented 4 years ago

Disclaimer: These are not benchmarks. (but ...)

I was curious to see what a simple ThreadPool change would do. See https://github.com/jetty-project/jetty-loom/blob/master/src/main/java/org/eclipse/jetty/loom/LoomThreadPool.java

Code at https://github.com/jetty-project/jetty-loom

The results:

With Loom

$ ab -n 100000 -c 10000 http://localhost:8888/
This is ApacheBench, Version 2.3 <$Revision: 1807734 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)
Completed 10000 requests
Completed 20000 requests
Completed 30000 requests
Completed 40000 requests
Completed 50000 requests
Completed 60000 requests
Completed 70000 requests
Completed 80000 requests
Completed 90000 requests
Completed 100000 requests
Finished 100000 requests

Server Software:        Jetty(Loom)-10.0.0-SNAPSHOT
Server Hostname:        localhost
Server Port:            8888

Document Path:          /
Document Length:        7 bytes

Concurrency Level:      10000
Time taken for tests:   5.701 seconds
Complete requests:      100000
Failed requests:        0
Total transferred:      15900000 bytes
HTML transferred:       700000 bytes
Requests per second:    17541.33 [#/sec] (mean)
Time per request:       570.082 [ms] (mean)
Time per request:       0.057 [ms] (mean, across all concurrent requests)
Transfer rate:          2723.70 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:       81  283 339.9    210    3185
Processing:    54  250 103.3    244    3660
Waiting:       35  142 111.9    113    3640
Total:        260  532 369.4    483    4720

Percentage of the requests served within a certain time (ms)
  50%    483
  66%    529
  75%    540
  80%    553
  90%    584
  95%   1316
  98%   1494
  99%   2139
 100%   4720 (longest request)

Without Loom (Using QTP)

$ ab -n 100000 -c 10000 http://localhost:8888/
This is ApacheBench, Version 2.3 <$Revision: 1807734 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)
Completed 10000 requests
Completed 20000 requests
Completed 30000 requests
Completed 40000 requests
Completed 50000 requests
Completed 60000 requests
Completed 70000 requests
Completed 80000 requests
Completed 90000 requests
Completed 100000 requests
Finished 100000 requests

Server Software:        Jetty(10.0.0-SNAPSHOT)
Server Hostname:        localhost
Server Port:            8888

Document Path:          /
Document Length:        7 bytes

Concurrency Level:      10000
Time taken for tests:   5.869 seconds
Complete requests:      100000
Failed requests:        0
Total transferred:      15400000 bytes
HTML transferred:       700000 bytes
Requests per second:    17040.12 [#/sec] (mean)
Time per request:       586.850 [ms] (mean)
Time per request:       0.059 [ms] (mean, across all concurrent requests)
Transfer rate:          2562.67 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0  309 621.1    155    3297
Processing:     8  153  72.9    145     364
Waiting:        5  107  59.9    102     342
Total:         31  462 638.7    378    3486

Percentage of the requests served within a certain time (ms)
  50%    378
  66%    407
  75%    419
  80%    430
  90%    450
  95%   1237
  98%   3413
  99%   3463
 100%   3486 (longest request)
bowbahdoe commented 4 years ago

@joakime Can you run that test but with the 0s replaced by Integer.MAX_VALUE just to see how/if that affects things?

joakime commented 4 years ago

@bowbahdoe ~it becomes unstable~. (if I run the same ab, instead of an unstable command line, then it doesn't seem to show much change)

Commit https://github.com/jetty-project/jetty-loom/commit/4710f66ce46a22a9a691b311afa48f1d773d98a2

Results with Loom (edit: now using same ab command line as before)

$ ab -n 100000 -c 10000 http://localhost:8888/
This is ApacheBench, Version 2.3 <$Revision: 1807734 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)
Completed 10000 requests
Completed 20000 requests
Completed 30000 requests
Completed 40000 requests
Completed 50000 requests
Completed 60000 requests
Completed 70000 requests
Completed 80000 requests
Completed 90000 requests
Completed 100000 requests
Finished 100000 requests

Server Software:        Jetty(Loom)-10.0.0-SNAPSHOT
Server Hostname:        localhost
Server Port:            8888

Document Path:          /
Document Length:        7 bytes

Concurrency Level:      10000
Time taken for tests:   28.976 seconds
Complete requests:      100000
Failed requests:        0
Total transferred:      15900000 bytes
HTML transferred:       700000 bytes
Requests per second:    3451.08 [#/sec] (mean)
Time per request:       2897.646 [ms] (mean)
Time per request:       0.290 [ms] (mean, across all concurrent requests)
Transfer rate:          535.86 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0  280 587.0     96    3208
Processing:     3  159 728.4    105   27840
Waiting:        2  127 730.0     73   27840
Total:          3  439 999.6    207   28930

Percentage of the requests served within a certain time (ms)
  50%    207
  66%    260
  75%    290
  80%    301
  90%   1194
  95%   1359
  98%   3383
  99%   3403
 100%  28930 (longest request)
lukago commented 3 years ago

I am also interested in this topic, I see above some experiments with custom connectors were started. @sbordet @gregw Any results from this? Is it possible to solve c10k problem with fibers based connector?

sbordet commented 3 years ago

@lukago the c10k problem is solved since long time, see https://webtide.com/do-looms-claims-stack-up-part-1/ where an untuned laptop can do 32k threads.

We would love to hear what your use case! Do you have a case where you need a single server to handle more than 10k concurrent threads? Or a case where you want to handle more than 10k connections with a thread-per-connection model? Thanks!

lukago commented 3 years ago

@sbordet What i mean is if we can achieve with jetty loom something like in this example based on netty: https://github.com/Jotschi/vertx-c10k-example

I runned similar tests (-c > 1000) for jetty with fiber based thread pool based on this example: https://github.com/tipsy/loomylin (javalin is based on jetty)

$ wrk -c1000 -d10s -t10 http://localhost:7002/computational

Running 10s test @ http://localhost:7002/computational
  10 threads and 1000 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   167.36ms  179.30ms   1.98s    95.49%
    Req/Sec    26.35     19.54   150.00     72.75%
  2082 requests in 10.10s, 266.35KB read
  Socket errors: connect 759, read 687, write 0, timeout 0
Requests/sec:    206.23
Transfer/sec:     26.38KB

Despite using fiber thread pool there is 759 socket errors. I guess this happen because in this example jetty connector does not utiize fiber threads and declines new connections after reaching some critical point of ~250 connections.

I see you have done some tweaks in connectors here: https://github.com/eclipse/jetty.project/compare/jetty-10.0.x-loom So my question is does it solve problem above? If not what needs to be done to make it work?

lukago commented 3 years ago

Also i wonder if tuning jetty connectors for fibers will get its performance closer to what we see in netty example where for 10k concurrent connections it handles 7k req/s.

sbordet commented 3 years ago

@lukago I have pushed Jetty+CometD to 400k connections a few years ago (I think it was Jetty 8) so c10k is not a problem since many years. We have clients in production that have > 100k connections on a single server, running easily. Both use async APIs on the server.

I have not run wrk or javalin but typically it's the client that can't cope with the load that itself generate, so I would take with a grain of salt any result positive or not of benchmarks that fail almost immediately for an almost empty load like the one you report above.

I just run the CometD benchmark with 10k connections at ~45k requests/s easily on my laptop:

========================================
Monitoring Started at Mon Jan 04 15:26:35 CET 2021
Operative System: Linux 5.8.0-33-generic amd64
JVM: AdoptOpenJDK OpenJDK 64-Bit Server VM 15.0.1+9 15.0.1+9
Processors: 12
System Memory: 77.612076% used of 31.164349 GiB
Used Heap Size: 543.07166 MiB
Max Heap Size: 12288.0 MiB
Young Generation Heap Size: 0.0 MiB
- - - - - - - - - - - - - - - - - - - - 
Testing 10000 clients in 100 rooms, 10 rooms/client
Sending 1000 batches of 1x50 bytes messages every 10000 µs
[2021-01-04T15:26:39.504+0100][info][gc] GC(13) Pause Young (Normal) (G1 Evacuation Pause) 4095M->606M(12288M) 35,085ms
[2021-01-04T15:26:44.279+0100][info][gc] GC(14) Pause Young (Normal) (G1 Evacuation Pause) 4038M->621M(12288M) 38,708ms
- - - - - - - - - - - - - - - - - - - - 
Monitoring Ended at Mon Jan 04 15:26:45 CET 2021
Elapsed Time: 10002 ms
    Time in JIT Compilation: 118 ms
    Time in Young GC: 74 ms (2 collections)
    Time in Old GC: 0 ms (0 collections)
Garbage Generated in Eden Space: 7280.0 MiB
Garbage Generated in Survivor Space: 102.22278 MiB
Garbage Generated in Tenured Space: 0.0 MiB
Average CPU Load: 524.17084/1200
========================================
Waiting for messages to arrive 988726/1000384
All messages arrived 1000384/1000384
Messages - Success/Expected = 1000384/1000384
Outgoing: Elapsed = 10000 ms | Rate = 990 messages/s - 99 batches/s - 16.579 MiB/s
Incoming - Elapsed = 10239 ms | Rate = 97697 messages/s - 44526 batches/s(45.58%) - 28.792 MiB/s
                   @  _  18,630 µs (137557, 13.75%)
            @         _  37,261 µs (82618, 8.26%)
            @         _  55,891 µs (80937, 8.09%)
            @         _  74,522 µs (82597, 8.26%)
            @         _  93,153 µs (80813, 8.08%)
            @         _  111,783 µs (82681, 8.26%) ^50%
            @         _  130,414 µs (82256, 8.22%)
            @         _  149,045 µs (81573, 8.15%)
            @         _  167,675 µs (79632, 7.96%)
           @          _  186,306 µs (76462, 7.64%) ^85%
       @              _  204,936 µs (51231, 5.12%)
       @              _  223,567 µs (45370, 4.54%) ^95%
   @                  _  242,198 µs (20155, 2.01%)
 @                    _  260,828 µs (5353, 0.54%)
 @                    _  279,459 µs (4013, 0.40%) ^99%
@                     _  298,090 µs (2135, 0.21%)
@                     _  316,720 µs (1848, 0.18%)
@                     _  335,351 µs (1428, 0.14%)
@                     _  353,981 µs (1102, 0.11%) ^99.9%
@                     _  372,612 µs (623, 0.06%)
Messages - Latency: 1000384 samples | min/avg/50th%/99th%/max = 156/104,136/101,253/265,420/372,768 µs
Messages - Network Latency Min/Ave/Max = 0/103/370 ms
Slowest Message ID = 11490/bench/a time = 372 ms
Thread Pool:
    threads:                219
    tasks:                  337244
    max concurrent threads: 179
    max queue size:         177
    queue latency avg/max:  0/39 ms
    task time avg/max:      3/20179 ms
-----

So, c10k is not a problem, provided you are async on the server.

gregw commented 3 years ago

@lukago I doubt we will ever use Loom within jetty for connectors. Jetty is already fully async internally and we have to do a lot of clever things to prevent head-of-line blocking and even dead locks if important tasks like HTTP/2 flow control get deferred. Virtual threads can easily be deferred, so we just don't think they are suitable (I'll say yet... but I dubious they eve will be).

However, using Loom virtual threads to dispatch to an application that is written in blocking mode is something that we have already implemented in our test branch and is something that is very likely to reach a main branch if Loom ever makes it to a released JVM. That will allows many thousands of virtual threads to block in the application. That could be 10s or 100s or 1000s of thousands, depending on how many other resources the application uses.

I'm not yet convinced this will give as good as results as writing async applications, but it should be in the ball park and it will definitely be much easier to write and maintain.

gregw commented 3 years ago

See blog posts:

Also see reddit discussion:

lukago commented 3 years ago

@sbordet what do you mean by async APIs, do you have any example? I dont have any knowledge about cometD but using Async Servlet seems to be not enough as I still get only ~250 max concurrent connections. I am mostly interested in sync applications but it will be good to know how to configure it for async apps too.

@gregw so if i run jetty from your test branch with config org.eclipse.jetty.io.use_loom=true then this socket errors should not occur anymore or any additional config for max number of concurrent connections is needed?

Edit: Ok, now i get it, the problem with connections was indeed on client side, i fixed it with ulimit command

Thanks! :) #

bowbahdoe commented 3 years ago

@lukago Just a general note that might be useful for spectators to the discussion even if its not what you were asking exactly.

An "async api" is any api that will not deliver its result immediately, such as later calling a callback or by returning an object that callbacks can be attached to.

Async Apis:

// Async api 1
// Will call the callback "later" maybe on a different thread
void getInteger(Consumer<Integer> whenDone) { ... }

// Usage
getInteger(x -> System.out.println(x + 1));

// Async api 2
// Will return an object that results can be chained on
Future<Integer> getInteger() { ... }

// Usage
getInteger()
  .then(x -> x + 1)
  .then(x -> System.out.println(x));

Synchronous Apis:

// Returns when result available
Integer getInteger() { ... }

// Usage
int x = getInteger();
x = x + 1;
System.out.println(x);

If most of your code is written using synchronous apis there won't be much or any performance benefit to using async servlets simply because there won't be "explicit yield points" that can be taken advantage of. The "seams" added by the callbacks or the Futures are what is used to "juggle" tasks between OS threads.

lukago commented 3 years ago

@gregw I'd like to aske one more thing about jetty I/O model. As I understand current version of Jetty use similar model to Netty for handling I/O. So there is a separate thread pool where each thread (aka event loop) is asking kernel for new I/O events and then dispatching it to another thread pool for blocking servlets or doing evertyhing on I/O pool for async servlets. What If we change I/O thread pool to be fiber based as well? Will it be beneficial for overall performance?

gregw commented 3 years ago

Lukasz,

I'm not familiar with Netty's internals, but our scheduler is not like you describe.

Having a selector thread that always dispatches to a thread pool means that there is always extra latency and often you will end up with a cold CPU cache and get parallel slowdown (see https://webtide.com/avoiding-parallel-slowdown-in-jetty-9/)

If you never dispatch, then you end up with deadlock and/or head-of-line blocking.

So we have a more adaptive strategy called "Eat what you Kill". Details here: https://webtide.com/eat-what-you-kill/

Currently I cannot see how that can be implemented with virtual threads in a beneficial way, however the strategy is adaptive enough to use virtual threads when appropriate, as we have done with our Loom branch.

regards

On Wed, 6 Jan 2021 at 14:16, Łukasz Gołębiewski notifications@github.com wrote:

@gregw https://github.com/gregw I'd like to aske one more thing about jetty I/O model. As I understand current version of Jetty use similar model to Netty for handling I/O. So there is a separate thread pool where each thread (aka event loop) is asking kernel for new I/O events and then dispatching it to separate thread pool for blocking servlets or doing evertyhing on I/O pool for async servlets. What If we change I/O thread pool to be fiber based as well? Will it be beneficial for overall performance?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/eclipse/jetty.project/issues/5078#issuecomment-755293079, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAARJLIIPT7HR2KEXILUQODSYRPCLANCNFSM4PHJKJFA .

-- Greg Wilkins gregw@webtide.com CTO http://webtide.com

gregw commented 1 year ago

Try now as it looks complete to me.

On Tue, 11 Apr 2023 at 14:12, xodiumluma @.***> wrote:

Lukasz, I'm not familiar with Netty's internals, but our scheduler is not like you describe. Having a selector thread that always dispatches to a thread pool means that there is always extra latency and often you will end up with a cold CPU cache and get parallel slowdown (see https://webtide.com/avoiding-parallel-slowdown-in-jetty-9/) If you never dispatch, then you end up with deadlock and/or head-of-line blocking. So we have a more adaptive strategy called "Eat what you Kill". Details here: https://webtide.com/eat-what-you-kill/ Currently I cannot see how that can be implemented with virtual threads in a beneficial way, however the strategy is adaptive enough to use virtual threads when appropriate, as we have done with our Loom branch. regards … <#m1784099298914155251> On Wed, 6 Jan 2021 at 14:16, Łukasz Gołębiewski @.> wrote: @gregw https://github.com/gregw https://github.com/gregw I'd like to aske one more thing about jetty I/O model. As I understand current version of Jetty use similar model to Netty for handling I/O. So there is a separate thread pool where each thread (aka event loop) is asking kernel for new I/O events and then dispatching it to separate thread pool for blocking servlets or doing evertyhing on I/O pool for async servlets. What If we change I/O thread pool to be fiber based as well? Will it be beneficial for overall performance? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#5078 (comment) https://github.com/eclipse/jetty.project/issues/5078#issuecomment-755293079>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAARJLIIPT7HR2KEXILUQODSYRPCLANCNFSM4PHJKJFA . -- Greg Wilkins @. CTO http://webtide.com

Hi @gregw https://github.com/gregw,

Hope you are well.

The "Eat what you kill" page https://webtide.com/eat-what-you-kill/ is chopped off at the bottom. Could your team please reinstate the missing content? It currently reads, "If the request is consumed by a different thread, then all the request data must be loaded into the new CPU c" <- there it ends.

Thanks!

— Reply to this email directly, view it on GitHub https://github.com/eclipse/jetty.project/issues/5078#issuecomment-1503218529, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAARJLOKKK74QJUIJMMIMYTXAVDC5ANCNFSM4PHJKJFA . You are receiving this because you were mentioned.Message ID: @.***>

-- Greg Wilkins @.***> CTO http://webtide.com

xodiumluma commented 1 year ago

Thanks @gregw, it started working about twenty minutes after I posted, so I deleted the post.

wendal commented 1 year ago

so, it works? https://www.eclipse.org/jetty/javadoc/jetty-10/org/eclipse/jetty/util/VirtualThreads.Configurable.html

gregw commented 1 year ago

@wendal yes it works... unless you are saying otherwise?