Closed jmcc0nn3ll closed 5 years ago
I'm also hitting an application hang with thread dump showing similar stack trace. Is this issue still there on 9.3.6?
At least in my case, it looks like the client did not send response back to the server when it did a sendString
. There is also an associated timeout in logs (jetty 9.3.6):
02:23:41.301 [Scheduler-1157726741] ERROR c.o.o.c.endpoints.WebSocketEndpoint - Error in websocket session java.net.SocketTimeoutException: Timeout on Read at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.onReadTimeout(AbstractWebSocketConnection.java:576) [websocket-common-9.3.6.v20151106.jar:9.3.6.v20151106] at org.eclipse.jetty.io.AbstractConnection.onFillInterestedFailed(AbstractConnection.java:162) [jetty-io-9.3.6.v20151106.jar:9.3.6.v20151106] at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.onFillInterestedFailed(AbstractWebSocketConnection.java:522) [websocket-common-9.3.6.v20151106.jar:9.3.6.v20151106] at org.eclipse.jetty.io.AbstractConnection$ReadCallback.failed(AbstractConnection.java:267) [jetty-io-9.3.6.v20151106.jar:9.3.6.v20151106] at org.eclipse.jetty.io.FillInterest.onFail(FillInterest.java:125) [jetty-io-9.3.6.v20151106.jar:9.3.6.v20151106] at org.eclipse.jetty.io.ssl.SslConnection.onFillInterestedFailed(SslConnection.java:216) [jetty-io-9.3.6.v20151106.jar:9.3.6.v20151106] at org.eclipse.jetty.io.AbstractConnection$ReadCallback.failed(AbstractConnection.java:267) [jetty-io-9.3.6.v20151106.jar:9.3.6.v20151106] at org.eclipse.jetty.io.FillInterest.onFail(FillInterest.java:125) [jetty-io-9.3.6.v20151106.jar:9.3.6.v20151106] at org.eclipse.jetty.io.AbstractEndPoint.onIdleExpired(AbstractEndPoint.java:163) [jetty-io-9.3.6.v20151106.jar:9.3.6.v20151106] at org.eclipse.jetty.io.IdleTimeout.checkIdleTimeout(IdleTimeout.java:166) [jetty-io-9.3.6.v20151106.jar:9.3.6.v20151106] at org.eclipse.jetty.io.IdleTimeout$1.run(IdleTimeout.java:50) [jetty-io-9.3.6.v20151106.jar:9.3.6.v20151106] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_65] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_65] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [na:1.8.0_65] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [na:1.8.0_65] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_65] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_65] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_65]
And all the threads hangs at the following state. I have to restart the process to recover:
"qtp1849201180-18080" #18080 prio=5 os_prio=0 tid=0x00007f5f2aa91000 nid=0x28a5 waiting on condition [0x00007f5f0c783000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x0000000700e3b8c0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) at org.eclipse.jetty.util.SharedBlockingCallback$Blocker.block(Redefined) at org.eclipse.jetty.websocket.common.BlockingWriteCallback$WriteBlocker.block(Redefined) at org.eclipse.jetty.websocket.common.WebSocketRemoteEndpoint.blockingWrite(Redefined) at org.eclipse.jetty.websocket.common.WebSocketRemoteEndpoint.sendString(Redefined)
Is there any plan to fix this?
@phoenix as there's no reproducer for this, its hard to identify the problem and implement a fix.
We see the same issue on 9.3.7.v20160115:
"pool-18-thread-42" #6179 prio=5 os_prio=0 tid=0x00007f7be0262800 nid=0x1741 waiting on condition [0x00007f7b1a21a000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x0000000687b00e08> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at org.eclipse.jetty.util.SharedBlockingCallback$Blocker.block(SharedBlockingCallback.java:210)
at org.eclipse.jetty.websocket.common.BlockingWriteCallback$WriteBlocker.block(BlockingWriteCallback.java:82)
at org.eclipse.jetty.websocket.common.WebSocketRemoteEndpoint.blockingWrite(WebSocketRemoteEndpoint.java:107)
at org.eclipse.jetty.websocket.common.WebSocketRemoteEndpoint.sendString(WebSocketRemoteEndpoint.java:387)
As a workaround (to prevent threads permanently hanging) we have added the following into org.eclipse.jetty.websocket.common.BlockingWriteCallback:
@Override
protected long getIdleTimeout()
{
return 30000L;
}
This isn't a proper fix but at least appears to timeout the blocking write, preventing permanent thread lock (eventually leading to thread starvation in our thread pool). I appreciate this is difficult to replicate but perhaps you could consider exposing the timeout so we can set it to a non-infinite value without having to recompile this jar?
@jonhill1977 do you have a reproduction / testcase?
@joakime Unfortunately not. I appreciate how hard it is for you to resolve this without reproduction steps. It happened in two of our production environments in a 12 hour period, but it hasn't happened in our test environment, so I assume it's either related to the increased load in production, or more likely a client socket is disconnecting unexpectedly (or otherwise not responding as expected).
If you can't resolve the underlying issue, could you at least consider allowing access to the IdleTimeout (which is currently implemented as always returning -1 in the superclass) as suggested in my workaround given previously? At least that way, we could set the timeout (in Jetty config or from our code) and we wouldn't have to recompile the jar when we upgrade to the latest versions of Jetty.
@joakime I think the timeout mechanism in this class was implemented for when we had similar problems with HttpOutput
. We expect to always get a success or failure callback, but if they never come, then we block forever.
We override in HttpOutput
to look at getHttpConfiguration().getBlockingTimeout()
. Currently we have solved all known problems in HttpOutput, so our default is -1.
So you could override in websocket to look for a similar configuration, or perhaps just change to return a very long timeout (1h), so at least it is not infinite....
But essentially that is just a band aid, if a reproduction cannot be found, it may be better to use the timeout to trigger some diagnostic code that reports the history and current state of the websocket that might suggest why it has neither succeeded or failed.
+1 to allowing access. I might see the issue in my Spring app: when a websocket client is consuming very slowly (by waiting 4 s before "finalization" of consuming) you can end up in this. Attaching my thread dumps from slow clients. http://pastebin.com/zE16BBgV
Presently have a reproduction "kit". I can outline how it can be made using Spring
Hi Guys, I am struggling with this issue.
It is clearly happening when WS client is killed without sending response.
Jetty is hung on this method -> WebSocketRemoteEndpoint.java:387
.
It has bad consequences in starvation of other clientOutbound
threads that wants to send Heartbeats
to this particular user.
Then only recovery is to restart Jetty
.
Is there anyone working on this or have some workaround for that?
@rafik991 This looks like the issue I had. You can try to get source for Jetty and modify method that jonhill1977 commented on Jul 26, 2016 described.
It got rid of hanged threads evetually
Hi Guys, I have the same issue: threads that send outgoing messages sometimes hang due to broken websockets. Unfortunately I can not use patched Jetty library, so I am using separate thread pool for sending messages in order to decrease impact of hanged threads.
It would be great if library itself had this fix!
Simply setting non-infinite getIdleTimeout() in the source code would aready help. I hope library makers will abandon the infinite waiting time or make an option!
@askaribragimov which idle timeout are you referring to? (there are many in Jetty, each referring to a different layer in the stack)
@joakime We have the same issue.
We call SimpMessagingTemplate.convertAndSend from 3 different threads:
thread 1:
thread 2:
thread 3:
Outgoing messages sending hangs after 20-30 hours.
We use Jetty as embedded servlet container for Spring Boot application. Spring Boot Jetty Starter ver. 1.4.1.RELEASE https://mvnrepository.com/artifact/org.springframework.boot/spring-boot-starter-jetty/1.4.1.RELEASE
See thread dump below:
"clientOutboundChannel-1" #136 prio=5 os_prio=0 tid=0x0000000042a38000 nid=0x856c waiting on condition [0x00000000537fe000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000003ca68ae30> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(Unknown Source)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(Unknown Source)
at org.eclipse.jetty.util.SharedBlockingCallback$Blocker.block(SharedBlockingCallback.java:210)
at org.eclipse.jetty.websocket.common.BlockingWriteCallback$WriteBlocker.block(BlockingWriteCallback.java:82)
at org.eclipse.jetty.websocket.common.WebSocketRemoteEndpoint.blockingWrite(WebSocketRemoteEndpoint.java:107)
at org.eclipse.jetty.websocket.common.WebSocketRemoteEndpoint.sendString(WebSocketRemoteEndpoint.java:393)
at org.springframework.web.socket.adapter.jetty.JettyWebSocketSession.sendTextMessage(JettyWebSocketSession.java:190)
at org.springframework.web.socket.adapter.AbstractWebSocketSession.sendMessage(AbstractWebSocketSession.java:104)
at org.springframework.web.socket.handler.ConcurrentWebSocketSessionDecorator.tryFlushMessageBuffer(ConcurrentWebSocketSessionDecorator.java:132)
at org.springframework.web.socket.handler.ConcurrentWebSocketSessionDecorator.sendMessage(ConcurrentWebSocketSessionDecorator.java:104)
at org.springframework.web.socket.messaging.StompSubProtocolHandler.sendToClient(StompSubProtocolHandler.java:437)
at org.springframework.web.socket.messaging.StompSubProtocolHandler.handleMessageToClient(StompSubProtocolHandler.java:422)
at org.springframework.web.socket.messaging.SubProtocolWebSocketHandler.handleMessage(SubProtocolWebSocketHandler.java:338)
at org.springframework.messaging.support.ExecutorSubscribableChannel$SendTask.run(ExecutorSubscribableChannel.java:135)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Locked ownable synchronizers:
- <0x00000003c900ae40> (a java.util.concurrent.ThreadPoolExecutor$Worker)
- <0x00000003ca6933f8> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
@ruslan-reshetnikov that's useful, thank you. What version of Jetty are you using?
@joakime 9.3.11.v20160721
I've got the same issue with 9.4.3.v20170317. Always at the same place and always when closing a connection which might be already closed.
public static abstract class ChatWebSocketAdapterBase extends WebSocketAdapter implements ChatMessageTransport {
public void close() {
try {
if (isNotConnected())
return;
getRemote().flush();
getSession().close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
Is this a usage bug? close() is called when the user ends his chat session. The websocket might already be closed by the client browser. And I've got a nginx in front of jetty.
Http Server Thread Pool-11227
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
org.eclipse.jetty.util.SharedBlockingCallback$Blocker.block(SharedBlockingCallback.java:210)
org.eclipse.jetty.websocket.common.BlockingWriteCallback$WriteBlocker.block(BlockingWriteCallback.java:90)
org.eclipse.jetty.websocket.common.WebSocketRemoteEndpoint.flush(WebSocketRemoteEndpoint.java:458)
de.typografica.shop.web.frontend.chat.FrontendChatServlet$ChatWebSocketAdapterBase.close(FrontendChatServlet.java:88)
My users have just started to use the chat feature more. And now I have already 4 threads hanging on Blocker.block(). So I can reproduce this problem in production. As a short term workaround I will increase my overall threadpool size, but a real fix would be better, as the threadpool will likely get full over time. Or do you suggest another workaround for this problem?
I did a little bit more testing. If I remove the getRemote().flush();
call the problem disappears. So I think the problem happens when you flush() or maybe even send???() to a socket which was closed on the other side (which is a webpage in Chrome in my case).
Even we are facing this issue on a regular basis in production. Problem is as exactly described by @ruslan-reshetnikov .
@rototor, @bdarwin do you have idle timeouts defined?
The Blocker.block() will break out when the Endpoint has an idle timeout event.
We have many clients that do not experience these issues, and attempts at replicating have yet to succeed.
Yes @joakime we have idle timeout on httpconfiguration about 30mins or so. But as per our last observation the thread was blocked more than 8 hours. Yes it is not easy to reproduce. We have 5 servers and it happens only on 1 server. We followed @jonhill1977 suggestion and deployed. Hopefully things will get better.
It is very easy to reproduce. I do not have a special code at hand, but here goes the main principle. YOu will need Spring, eg 4.3.3. Something maybe can be just harvested here https://github.com/nickebbutt/stomp-websockets-java-client
In Java client, find method HandleFrame ( https://github.com/nickebbutt/stomp-websockets-java-client/blob/master/stomp-java-client/src/main/java/com/od/helloclient/HelloClient.java ) add a long Thread.sleep to it and you will get hanged Jetty thread
@joakime I have a idle timeout of one hour on the ServerConnector, but I don't think thats used here. BlockingWriteCallback does not override getIdleTimeout() from base class SharedBlockingCallback, so it's -1. Or should the blocker get any async notification (i.e. Thread.interrupt())?
I tried to build a small reproducing demo, but I could only get some strange exceptions when having many clients, but no hanging connections. e.g. in WebSocketeRemoteEndpoint.lockMsg this exception
if ((state & BLOCK_MASK) != 0)
throw new IllegalStateException(String.format("Blocking message pending %x for %s", state, type));
was thrown many times. No idea if this is a bug or expected. I can provide you the demo if you want to investigate that.
@rototor a demo that reproduces this specific issue (WebSocket hangs in blockingWrite) would be ideal, yes.
Run into the same problem.
Just had a quick look at it, but I think I have already found one race condition: https://github.com/eclipse/jetty.project/blob/jetty-9.3.x/jetty-websocket/websocket-common/src/main/java/org/eclipse/jetty/websocket/common/io/FrameFlusher.java
If thread T1 executing line 337 til 347 and then switches to Thread T2 which is before 314 and runs til line 332 and then T1 continues til 381, be aware that flusher.iterate() is called in state FAILED! So nothing will be processed anymore also the last FrameEntry in the queue will stay forever.
One callback will never be closed, neither success nor failure.
@joakime Any thoughts on @spjoe's comment? We're running into the same problem described by @ruslan-reshetnikov with 9.4.7
@dreis2211 the IteratingCallback
mechanism that FrameFlusher
uses is used in many places in Jetty, the problem described by @spjoe would manifest in regular HTTP with Servlet 2.5 API usage as well (especially so with HTTP/2 usage).
This problem still does not have a reproduction test case. Many different scenarios have been attempted to reproduce this. Large through small scale configurations, on various platforms, using many different varieties of behaviors, still no reproduction case. Hundreds of billions of websocket messages have been sent/received successfully with the various test cases of our own and various other projects / large users of the websocket library in Jetty.
There's nothing to fix yet, as the cause cannot even be explained.
@dreis2211 my recommendation is to switch to netty, they have a well defined(maybe also jetty has, but I was not able to find good documentation about that) and easier threading model which IMHO leads to less bugs. This is what I have done and I do not regret it :-). Sorry for advertising a rivalry product.
@joakime HTTP/2 uses code from websocket-common, are you sure? I showed one race condition which does not mean that there are many more, but I do not understand how you can conclude that the cause cannot even be explained. Also in general this is quite a bold statement :-)
My suggestion for reproduction is to write a test with my described threading behavior, look into DeterministicScheduler from jMock(Maybe you have to refactor a bit to really execute just the parts you want). With this tool I was able to find race condition also in other libraries. I most likely will not do it, because I currently do not use jetty.
IteratingCallback
is a core class, used for handling callback logic in many places in Jetty. Its use in WebSocket's FrameFlusher
is only 1 such place. There are many other uses of IteratingCallback
within Jetty, such as the standard Servlet spec 2.5 blocking behaviors and HTTP/2. The reported suspects for this issue would also manifest in those other places, which it doesn't.
@spjoe no worries about advertising other open source projects. :-) The netty project and eclipse jetty project are quite friendly to each other. Heck, netty even uses our alpn-boot layers for HTTP/2 support.
Links to suggestions from @spjoe
@dreis2211 since you stated you are using 9.4.7 ... Do you have an up to date threaddump showing this issue on that version of Jetty?
@joakime the race reported by @spjoe is legit and should be fixed.
It has nothing to do with IteratingCallback
.
HTTP2Flusher
is implemented differently, as it atomically checks whether it's closed/failed before accepting an entry in the queue.
I think we should fix it regardless of the WebSocket work in Jetty 10. I can give it a shot, if you want me to.
@joakime I'm desperately trying to reproduce it again. Unfortunately, I didn't keep the dump and found this issue only afterwards. At least for us it was a "one-in-a-million" (probably) bug that never happened (or reported) before. So not that critical for us. It's just unfortunate that only restarting helped.
@sbordet obviously, go for it. heh.
The FrameFlusher in Jetty 10 is the same as in Jetty 9.3.x+, so it would need love there too.
@joakime Just had it again.
I can provide the full threaddump if you want, but basically that's the part which is relevant to this issue
"clientOutboundChannel-221" #26291 prio=5 os_prio=0 tid=0x00007f0de0070800 nid=0x4be0 waiting on condition [0x00007f0d739c7000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000cac6b8a0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at org.eclipse.jetty.util.SharedBlockingCallback$Blocker.block(SharedBlockingCallback.java:210)
at org.eclipse.jetty.websocket.common.BlockingWriteCallback$WriteBlocker.block(BlockingWriteCallback.java:90)
at org.eclipse.jetty.websocket.common.WebSocketRemoteEndpoint.blockingWrite(WebSocketRemoteEndpoint.java:107)
at org.eclipse.jetty.websocket.common.WebSocketRemoteEndpoint.sendString(WebSocketRemoteEndpoint.java:393)
at org.springframework.web.socket.adapter.jetty.JettyWebSocketSession.sendTextMessage(JettyWebSocketSession.java:273)
at org.springframework.web.socket.adapter.AbstractWebSocketSession.sendMessage(AbstractWebSocketSession.java:101)
at org.springframework.web.socket.handler.ConcurrentWebSocketSessionDecorator.tryFlushMessageBuffer(ConcurrentWebSocketSessionDecorator.java:132)
at org.springframework.web.socket.handler.ConcurrentWebSocketSessionDecorator.sendMessage(ConcurrentWebSocketSessionDecorator.java:104)
at org.springframework.web.socket.messaging.StompSubProtocolHandler.sendToClient(StompSubProtocolHandler.java:470)
at org.springframework.web.socket.messaging.StompSubProtocolHandler.handleMessageToClient(StompSubProtocolHandler.java:457)
at org.springframework.web.socket.messaging.SubProtocolWebSocketHandler.handleMessage(SubProtocolWebSocketHandler.java:338)
at org.springframework.messaging.support.ExecutorSubscribableChannel$SendTask.run(ExecutorSubscribableChannel.java:135)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
It's essentually the one that @ruslan-reshetnikov described already, but with 9.4.7.
Cheers, Christoph
Spring + Stomp + Custom spring executors + sessions that have been decorated via the DecoratedObjectFactory. Pretty complex setup you got there.
I suspect that Christoph has a pretty much default Spring Stomp stack at work.
I use the same, I saw the same when I was still able to reproduce it.
@joakime & @onkami Indeed, quite a common setup with Spring-Boot.
Spring-Boot. good to know. Are there any open source projects with this kind of setup already? Be good to have a real-world-ish testcase for it. (make sure we didn't break API that spring expects)
https://spring.io/guides/gs/messaging-stomp-websocket/
They use a Javascript client but there is also a demo Java client (also jetty-based) in Spring itself.
@joakime Regarding the API expectations I found an issue in the Spring-Framework core https://jira.spring.io/browse/SPR-14757 that references this issue and has one particular comment.
Looking at ConcurrentWebSocketSessionDecorator we probably should add an isOpen() check in the loop that sends messages but I'd expect the WebSocket runtime, Jetty it seems, to throw an exception if we try to send after the session is closed.
Any thoughts on this?
Thanks by the way for helping!
Note that Sping can use either Tomcat or Jetty and you should do some stuff in your maven/gradle build file in order to switch to Jetty, but it will be straightforward enough and easy google-able http://therealdanvega.com/blog/2017/05/10/spring-boot-moving-tomcat-jetty
@dreis2211 you found my ticket yay
@joakime, the CometD build against commit 4389da9 above passes cleanly.
The issue-272-ws-blockingwrite
branch has been merged back to jetty-9.4.x
Hey @joakime.
Is there an ETA when 9.4.8 could be released in a non-snapshot version?
@dreis2211 9.4.8 is close, but we need to figure out a performance/livelock issue first. We don't have a precise date yet though.
@sbordet I appreciate the feedback nonetheless. Good luck with solving the livelock issue.
migrated from Bugzilla #484698 status NEW severity normal in component websocket for 9.3.x Reported in version 9.3.5 on platform PC Assigned to: Joakim Erdfelt
On 2015-12-18 10:48:14 -0500, Ben Tatham wrote: