jetty / jetty.project

Eclipse Jetty® - Web Container & Clients - supports HTTP/2, HTTP/1.1, HTTP/1.0, websocket, servlets, and more
https://eclipse.dev/jetty
Other
3.83k stars 1.91k forks source link

Deadlock when number of simultanious concurrent requests exceeds max thread count in thread pool #7212

Closed yannmar closed 7 months ago

yannmar commented 2 years ago

Jetty version

11.0.6

Java version

java.vm.name=OpenJDK 64-Bit Server VM java.vm.version=11.0.5+10-b520.38

Question

Hi team.

I am observing a deadlock with my custom local endpoint which I want to use to invoke jetty programmatically in async manner. That's how it happens:

  1. When endpoint write operation is asynchronous (and it looks like it's allowed to make it async, because that's how api is designed)
  2. And when many concurrent requests arrive, at some point all qtp threads invoke this async write near simultaneously, and then start waiting when callback will be invoked
  3. The problem is that callback will be never invoked because all qtp threads are busy by waiting for that callback to be invoked.
  4. So all threads are stuck forever.

Please see below how stack trace of stuck thread looks like. All together that looks like inherent api desing issue, and any endpoint that invokes write callback in standard thread pool will be affected.

To workaround the problem it's probably needed to have two separate thread pools:

  1. thread pool which is used to invoke read operation callbacks
  2. thread pool which is used to invoke write operation callbacks

Also that means that all read callbacks must be always executed asynchronously, it's not allowed to invoke getFillInterest().fillable() straight from AbstractEndPoint::needsFillInterest(). The only allowed way to do it is always in async manner: readCallbackExecutor.execute(getFillInterest()::fillable), otherwise it won't be possible to separate write and read callback threads.

So it seems I did nothing illegal or undocumented and ran into serious issue which was just accidentally noticed. May be it's needed to document that somehow or elaborate threading rules/concurrency model? Iit looks like currently jetty is fully callback driven but it turns out that it may be unsafe.

"qtp762218386-14" #14 prio=5 os_prio=0 cpu=78.13ms elapsed=29.16s tid=0x0000021572b98000 nid=0x14d18 waiting on condition  [0x000000d0679fd000]
   java.lang.Thread.State: WAITING (parking)
    at jdk.internal.misc.Unsafe.park(java.base@11.0.5/Native Method)
    - parking to wait for  <0x000000070e364520> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.park(java.base@11.0.5/LockSupport.java:194)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(java.base@11.0.5/AbstractQueuedSynchronizer.java:2081)
    at org.eclipse.jetty.util.SharedBlockingCallback$Blocker.block(SharedBlockingCallback.java:206)
    at org.eclipse.jetty.server.HttpOutput.channelWrite(HttpOutput.java:254)
    at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:881)
    at org.glassfish.jersey.servlet.internal.ResponseWriter$NonCloseableOutputStreamWrapper.write(ResponseWriter.java:301)
    at java.io.ByteArrayOutputStream.writeTo(java.base@11.0.5/ByteArrayOutputStream.java:187)
    - locked <0x000000070dcfaf80> (a java.io.ByteArrayOutputStream)
    at org.glassfish.jersey.message.internal.CommittingOutputStream.flushBuffer(CommittingOutputStream.java:278)
    at org.glassfish.jersey.message.internal.CommittingOutputStream.commit(CommittingOutputStream.java:232)
    at org.glassfish.jersey.message.internal.CommittingOutputStream.close(CommittingOutputStream.java:247)
    at org.glassfish.jersey.message.internal.OutboundMessageContext.close(OutboundMessageContext.java:842)
    at org.glassfish.jersey.server.ContainerResponse.close(ContainerResponse.java:389)
    at org.glassfish.jersey.server.ServerRuntime$Responder.writeResponse(ServerRuntime.java:707)
    at org.glassfish.jersey.server.ServerRuntime$Responder.processResponse(ServerRuntime.java:373)
    at org.glassfish.jersey.server.ServerRuntime$Responder.process(ServerRuntime.java:363)
    at org.glassfish.jersey.server.ServerRuntime$1.run(ServerRuntime.java:258)
    at org.glassfish.jersey.internal.Errors$1.call(Errors.java:248)
    at org.glassfish.jersey.internal.Errors$1.call(Errors.java:244)
    at org.glassfish.jersey.internal.Errors.process(Errors.java:292)
    at org.glassfish.jersey.internal.Errors.process(Errors.java:274)
    at org.glassfish.jersey.internal.Errors.process(Errors.java:244)
    at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:265)
    at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:234)
    at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:680)
    at org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:394)
    at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:346)
    at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:366)
    at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:319)
    at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:205)
    at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:764)
    at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:508)
    at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:221)
    at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1370)
    at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:176)
    at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:463)
    at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:174)
    at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1292)
    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:129)
    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122)
    at org.eclipse.jetty.server.Server.handle(Server.java:562)
    at org.eclipse.jetty.server.HttpChannel.lambda$handle$0(HttpChannel.java:406)
    at org.eclipse.jetty.server.HttpChannel$$Lambda$273/0x0000000840204c40.dispatch(Unknown Source)
    at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:663)
    at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:398)
    at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:282)
    at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:319)
    at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:100)
    at asynclocalconnectorbug.AsyncLocalEndPoint.lambda$read$0(AsyncLocalEndPoint.java:108)
    at asynclocalconnectorbug.AsyncLocalEndPoint$$Lambda$267/0x0000000840202c40.run(Unknown Source)
    at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:894)
    at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1038)
    at java.lang.Thread.run(java.base@11.0.5/Thread.java:834)
sbordet commented 2 years ago

It's not clear what you want to do, and your analysis is incorrect.

You are performing blocking write calls, this is evident from the stack trace. It's Jersey that performs the blocking writes on behalf of your application.

Jetty's threading model is such that even if all threads are blocked in writes due to TCP congestion, they will be unblocked by the selector thread performing a non-blocking completion of the write operation, so Jetty already takes this into account and already has solutions for this particular case.

There is no need for separate read-write thread pools. They won't help. And as such also your point about how to call fillable() is incorrect and will just make the server performance plunge.

Please explain what your problem is, rather than trying to suggest wrong solutions. Why do you need your AsyncLocalEndPoint? You typically don't need to play with the Jetty APIs at this level.

If you explain your problem first, we may be able to guide you to a better/simpler solution

yannmar commented 2 years ago

Hi Simone @sbordet. Thank you for your reply.

Standard socket connector was not used. I am trying to figure out how to develop custom local connector and custom local endpoint to make it possible to call jetty programmatically in async manner, therefore there are neither selector nor selector threads in my use case. Jetty is needed to handle http protocol and another nio library (grizzly or netty) may be responsible for socket operations and also for advanced routing. Or even http traffic may be sent over kafka for example.

Basically idea is to have simple building block for transport customization and let fully-fledged web server (jetty) do http protocol handling, in particular run servlet based applications.

You are performing blocking write calls, this is evident from the stack trace. It's Jersey that performs the blocking writes on behalf of your application.

Jersey is used in my test environment, but blocking happens not in jersey code, it happens in jetty code. Jetty invokes async write endpoint operation (which is async by design) and then jetty starts to wait for write callback invocation and does that in blocking manner. So if standard thread pool is used to call write callback then described deadlock may happen.

The problem is that jetty api looks fully callback driven, there are no threading rules or concurrency model specified (or I didn't find one in Programming Guide, which is btw a great document), but it turns out that threading rules are needed and very important because otherwise problems like this deadlock may happen.

joakime commented 2 years ago

Standard socket connector was not used. I am trying to figure out how to develop custom local connector and custom local endpoint to make it possible to call jetty programmatically in async manner, therefore there are neither selector nor selector threads in my use case.

The LocalConnector exists for this purpose (which we use extensively for our own unit testing).

You might want to also look at how the UnixSocket implementations in Jetty 10+ are done, they might help you understand a bit on how to implement this layer.

Jetty is needed to handle http protocol and another nio library (grizzly or netty) may be responsible for socket operations and also for advanced routing. Or even http traffic may be sent over kafka for example.

Why? Neither are better than Jetty's own I/O handling. Jetty expects internal implementations of it's own I/O handling for many advanced features, using a non-jetty implementation will just slow things down needlessly due to the inefficiencies you are creating. (Jetty internals will have to use less powerful and poor performant choices when it cannot find it's own optimized implementations)

What do you mean by "advanced routing"?

Kafka to http? (which is odd, unless you have a kafka proxy) or http to Kafka? (which exists in multiple forms already)

Also, you might want to plug into the work being done right now for HTTP/3, which has caused a new rework of the internals of Jetty to support this protocol.

yannmar commented 2 years ago

Hi Joakime @joakime thank you for your reply.

The LocalConnector exists for this purpose (which we use extensively for our own unit testing).

LocalConnector is not suitable unfortunately, it's not async and cannot be used for high throughput, it has no threading model support, we may want to have some threading model support in order for example to easier integrate with threading model of custom transport (say, netty has explicit threading model where each socket channel is associated with particular thread).

Why? Neither are better than Jetty's own I/O handling.

Jetty I/O handling is for sure very good, the only problem is that sometime we cannot use it. For example people already have netty based application and then want to integrate servlet based application into it. So transport layer already exists and cannot be changed and then developers want to add better http support, http2 support or servlet support - that's just group of use cases where jetty may be very useful, but default jetty IO implementation cannot be used in that case. LocalConnector also cannot be used as it's not well compatible with other nio libraries.

Jetty expects internal implementations of it's own I/O handling for many advanced features,

That's source of uncertainty for me. At first glance, and based on my experiments, it's possible to implement custom AsyncLocalEndPoint and AsyncLocalConnector that will be a good basis for construction of bridge between custom transport and jetty. But it's unclear were pitfalls are, because every mentioned dependency on internal implementation may make custom connector and endpoint incompatible with jetty core at certain conditions, and I just don't know when such incompatibility may bite me.

What do you mean by "advanced routing"?

Advanced routing means that we, for example, may accept http and some binary protocol on the same port, so first we separate protocols and then handle them differently (kinda de-multiplexing). Or for example we may want websocket to be handled completely by netty based code, because in that case it's possible to implement back-pressure, but regular http requests can still arrive to the same port and we may want to have fully-fledged web server for handling http protocol at the same time.

Kafka to http? (which is odd, unless you have a kafka proxy)

That's just an example of custom transport, but absolutely possible. Say, microservices may prefer to talk to each other over kafka, and at the same time they may expose rest api, if we have kafka to jetty bridge then business logic can be implemented in the same way for both routes as a single jax-rs application.

Also, you might want to plug into the work being done right now for HTTP/3, which has caused a new rework of the internals of Jetty to support this protocol.

Oh! Thank you, I wasn't aware about that. Btw are you planning to rework base channel abstractions (connectors and endpoint) for that?

joakime commented 2 years ago

Hi Joakime @joakime thank you for your reply.

The LocalConnector exists for this purpose (which we use extensively for our own unit testing).

LocalConnector is not suitable unfortunately, it's not async and cannot be used for high throughput, it has no threading model support, we may want to have some threading model support in order for example to easier integrate with threading model of custom transport (say, netty has explicit threading model where each socket channel is associated with particular thread).

That threading model is dead.

It's not supported by the Servlet 6+ Spec, in fact all 1 thread == 1 request concepts have been removed entirely from Jakarta EE 10 across the board (even JSP has removed it).

Even things like ThreadLocal are dead, don't expect them to function in the near future. Spring has already migrated away from them.

The future is Loom and Scoped Locals - https://openjdk.java.net/jeps/8263012

Why? Neither are better than Jetty's own I/O handling.

Jetty I/O handling is for sure very good, the only problem is that sometime we cannot use it. For example people already have netty based application and then want to integrate servlet based application into it. So transport layer already exists and cannot be changed and then developers want to add better http support, http2 support or servlet support - that's just group of use cases where jetty may be very useful, but default jetty IO implementation cannot be used in that case. LocalConnector also cannot be used as it's not well compatible with other nio libraries.

Sounds like you need to implement the entire Java NIO layer for a new protocol that isn't IP and wire that up. I just don't see how Netty can be compatible with what's expected of Jetty. (I've personally used Netty for various services, last one being Netty 4.1.9. The techniques used between Netty and Jetty are vastly different.)

"better http support" is a curious statement, we already support the latest HTTP/1.1 specs and associated specs (uri, cookie, etc) and have compliance configurables to set how you want Jetty to operate. Seeing as we are active participants on the ietf http specs (and have been since HTTP/0.9 days), I wonder what "better" means to you?

Jetty expects internal implementations of it's own I/O handling for many advanced features,

That's source of uncertainty for me. At first glance, and based on my experiments, it's possible to implement custom AsyncLocalEndPoint and AsyncLocalConnector that will be a good basis for construction of bridge between custom transport and jetty. But it's unclear were pitfalls are, because every mentioned dependency on internal implementation may make custom connector and endpoint incompatible with jetty core at certain conditions, and I just don't know when such incompatibility may bite me.

There are efficiencies with our own internal mechanisms for threading, bytebuffer handling, and memory consumption (pooling, caching, reused objects, expectations that we own and control various objects not another library, etc)

HTTP/2 for example is particularly thread and selector sensitive (if you don't get things 100% right you can easily cause a deadlock across all http/2 streams on a specific physical connection), The threading/execution model in Jetty is custom, with features to allow cases like HTTP/2, to always function properly in a highly contended environment.

What do you mean by "advanced routing"?

Advanced routing means that we, for example, may accept http and some binary protocol on the same port, so first we separate protocols and then handle them differently (kinda de-multiplexing). Or for example we may want websocket to be handled completely by netty based code, because in that case it's possible to implement back-pressure, but regular http requests can still arrive to the same port and we may want to have fully-fledged web server for handling http protocol at the same time.

We have all of those features already present in Jetty already and it's ConnectionFactory implementations.

You can have a single connector support : SSL/TLS (or plain-text), HTTP/1, HTTP/2, Proxy v1, Proxy v2, CONNECT, grpc, etc already. Adding new protocols is just a matter of implementing the DetectorConnectionFactory.

Also, you might want to plug into the work being done right now for HTTP/3, which has caused a new rework of the internals of Jetty to support this protocol.

Oh! Thank you, I wasn't aware about that. Btw are you planning to rework base channel abstractions (connectors and endpoint) for that?

There's been some work in those components already. See https://github.com/eclipse/jetty.project/pull/6765

yannmar commented 2 years ago

Hi Joakim @joakime

we already support the latest HTTP/1.1 specs and associated specs ... I wonder what "better" means to you?

In my opinion you have (jetty has) quite good http support, and that's probably only available for embedding implementation. That's very strong side of jetty but it's not reusable in custom environments. What I am trying to tell, is that this very good http support cannot be used by developers if they have custom transport layer, they often already have it, but they want to get http support jetty may provide, there is such demand and there are such use cases.

We have all of those features already present in Jetty already and it's ConnectionFactory implementations. Adding new protocols is just a matter of implementing the DetectorConnectionFactory

In my personal opinion jetty apis are quite complicated, it's easier to deal with netty, for doing new protocol implementation from scratch. But things often happen in reverse order, people already have some grizzly or netty based protocol implementation and then want to add http support or better http support (if they already implemented some basic subset near HTTP/0.9 level) or http2 support or servlet support. The only option to add good http support is to embed jetty into your existing application, but jetty doesn't let easily customize transport layer, that's lot of pain.

joakime commented 2 years ago

we already support the latest HTTP/1.1 specs and associated specs ... I wonder what "better" means to you?

In my opinion you have (jetty has) quite good http support, and that's probably only available for embedding implementation. That's very strong side of jetty but it's not reusable in custom environments. What I am trying to tell, is that this very good http support cannot be used by developers if they have custom transport layer, they often already have it, but they want to get http support jetty may provide, there is such demand and there are such use cases.

Configuring the HTTP spec and compliance can be done with standalone ${jetty.home} / ${jetty.base} split easily (they are property values).

We have all of those features already present in Jetty already and it's ConnectionFactory implementations. Adding new protocols is just a matter of implementing the DetectorConnectionFactory

In my personal opinion jetty apis are quite complicated, it's easier to deal with netty, for doing new protocol implementation from scratch. But things often happen in reverse order, people already have some grizzly or netty based protocol implementation and then want to add http support or better http support (if they already implemented some basic subset near HTTP/0.9 level) or http2 support or servlet support. The only option to add good http support is to embed jetty into your existing application, but jetty doesn't let easily customize transport layer, that's lot of pain.

It is literally 1 method to identify/detect a connection type.

Detection Dectecting.detect(ByteBuffer buf);

Where you return Detection.RECOGNIZED, Detection.NOT_RECOGNIZED, or Detection.NEED_MORE_BYTES to indicate if you are handling that connection type.

https://github.com/eclipse/jetty.project/blob/fb7613581fc43164e97eacee84ed0a8009bcc91d/jetty-server/src/main/java/org/eclipse/jetty/server/ConnectionFactory.java#L85-L122

Not sure how we could make that simpler.

sbordet commented 2 years ago

@yannmar you are right that there is no out-of-the-box way to perform the kind of proxying that you would need, but the components are there and should be possible to implement it correctly.

Just for reference, see the documentation about how to implement a custom protocol: https://www.eclipse.org/jetty/documentation/jetty-10/programming-guide/index.html#pg-arch-io-echo https://www.eclipse.org/jetty/documentation/jetty-10/programming-guide/index.html#pg-server-io-arch-connection-factory-custom

LocalConnector and LocalEndPoint (or ByteArrayEndPoint) should be taken as a starting points, with the idea of creating one LocalEndPoint per every external source "connection", and configure it with a Connection for the right protocol.

LocalEndPoint does support asynchronous usage.

About your specific problem of all threads being blocked in writes, it should be solvable as long as the external system notifies you of write availability (semantically identical to how NIO notifies about OP_WRITE). IIUC, you dispatch to the thread pool this event, but there are not thread to run it. If you have this event from the external system, you should not dispatch it to a thread pool, but run it directly to unblock the blocked writes by triggering a call to WriteFlusher.completeWrite() (e.g. via LocalEndPoint.takeOutput()).

I think your problem is just about the fact that you have a system where both server and clients are "local", they share the same thread pool, and it just happened that the whole thread pool was used by the server-side, leaving no threads to the client-side. You would have had the same problem with server and client being "remote", where the client was not reading from the network.

So I don't think it's a Jetty issue (besides the lack of this feature being available out-of-the-box), just the way you have setup your system.

I would not mind to have this functionality present in Jetty out-of-the-box, but you're the first one to ask so it's low priority. Would it be possible to look at your code?

yannmar commented 2 years ago

@sbordet Thank you Simone for extended reply.

If you have this event from the external system, you should not dispatch it to a thread pool, but run it directly to unblock the blocked writes by triggering a call to WriteFlusher.completeWrite() (e.g. via LocalEndPoint.takeOutput()).

Yes it looks so, but that's just a hidden threading rule. I would suggest to at least document it explicitly, because it's very critical for custom endpoint implementation. I have a related concern/question - when I call WriteFlusher.completeWrite() it should be guaranteed somehow that thread given to jetty to execute this callback method won't be used by jetty to do for example read, currently it seems it's not specified how much work jetty can do when this callback is invoked. Based on your reply it seems some hidden/implicit rule exists and jetty probably won't do a lot and will return thread back to invoker quickly, but again that knowledge is very important for proper custom endpoint implementation.

I think your problem is just about the fact that you have a system where both server and clients are "local", they share the same thread pool, and it just happened that the whole thread pool was used by the server-side, leaving no threads to the client-side.

Yes, I think that's what happened with my code.

So I don't think it's a Jetty issue (besides the lack of this feature being available out-of-the-box), just the way you have setup your system.

It doesn't look like implementation issue, but it maybe documentation issue. Just there are some implicit threading/concurrency rules that are not exposed via documentation, when you look at docs it looks like it's completely callback driven, but it turns out that there are very important threading nuances. Or I could miss something.

I would not mind to have this functionality present in Jetty out-of-the-box, but you're the first one to ask

May be others just ask in wrong place, for example: https://stackoverflow.com/questions/70099599/interfacing-netty-with-servlet-based-code

Would it be possible to look at your code?

I will try to prepare it for sharing, that may take some time. Basically it's relatively close to LocalEndPoint, just when you ask it to execute request, it returns promise (CompletableFuture), so there is an option to not waste thread waiting for that response in blocking manner, also there is streaming support where you can specify an executor which will be used to call callbacks (connectionClosed, needMoreData, someResultReady), this executor is intended to simplify interaction with nio frameworks that have strict and explicit threading rules.

yannmar commented 2 years ago

@joakime Thank you Joakim

Detection api doesn't look complicated, I agree. Though protocol implementation will be very complicated in my opinion, especially if it's needed to do chained processing as it looks like there is no support for that. I mean, you can chain handlers that transform chunks of bytes to chunks of bytes, but it might be needed to chain handlers that convert bytes to objects or objects to objects and it looks like jetty api has no support for that, probably due to historical reasons, but that's just current state of affairs we need to deal with.

yannmar commented 2 years ago

Hi Joakim, Simone (@joakime, @sbordet).

Would it be possible to look at your code?

Here are sources for async local connector and end point which I promised to show above, sorry it took much more time to prepare that than I could expect.

Connector and endpoint:

Netty to jetty bridge handler:

Demo app:

async-local-endpoint-and-connector.zip

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has been a full year without activity. It will be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] commented 7 months ago

This issue has been automatically marked as stale because it has been a full year without activity. It will be closed if no further activity occurs. Thank you for your contributions.

sbordet commented 7 months ago

I'm closing this issue because we are implementing a memory transport as part of #8979, which would also fix #10080.