rstoyanchev / spring-websocket-portfolio

740 stars 443 forks source link

Maybe a problem with simplebroker #31

Closed sdnetwork closed 10 years ago

sdnetwork commented 10 years ago

hi,

in some case the simplebroker don't respond to a connect stomp command but the websocket stay open

i have noticed this problem when i work on a mobile network (don't stable). i have logged the case :

16:58:37 [http-bio-80-exec-18] StompDecoder - Decoded [Payload byte[0]][Headers={stompCommand=CONNECT, stompCredentials=[PROTECTED], nativeHeaders={heart-beat=[10000,10000], passcode=[PROTECTED], login=[tpy], accept-version=[1.1,1.0]}, simpMessageType=CONNECT, id=ba0697fe-1682-80a8-b68f-d254e6a944f5, timestamp=1395158317547}]
16:58:37 [clientOutboundChannel-1] StompEncoder - Encoded STOMP command=CONNECTED headers={heart-beat=[0,0], version=[1.1]}
16:58:38 [http-bio-80-exec-18] StompDecoder - Decoded [Payload byte[0]][Headers={stompCommand=SUBSCRIBE, nativeHeaders={id=[sub-0], destination=[/stats/0a6ijX2dX5VdRQAQFskZw!]}, simpMessageType=SUBSCRIBE, simpSubscriptionId=sub-0, simpDestination=/stats/0a6ijX2dX5VdRQAQFskZw!, id=a5f9f59f-417b-4790-6346-5b251a724732, timestamp=1395158318254}]
16:58:38 [clientInboundChannel-1] SimpAnnotationMethodMessageHandler - Handling message, lookupDestination=/stats/0a6ijX2dX5VdRQAQFskZw!
16:58:38 [clientInboundChannel-1] SimpAnnotationMethodMessageHandler - No matching method found
16:58:38 [clientInboundChannel-1] DefaultSubscriptionRegistry - Adding subscription id=sub-0, destination=/stats/0a6ijX2dX5VdRQAQFskZw!
16:58:38 [http-bio-80-exec-18] StompDecoder - Decoded [Payload byte[0]][Headers={stompCommand=UNSUBSCRIBE, nativeHeaders={id=[sub-0]}, simpMessageType=UNSUBSCRIBE, simpSubscriptionId=sub-0, id=6c5b9c24-9042-e33a-2dea-2f494da8e156, timestamp=1395158318928}]
16:58:38 [clientInboundChannel-1] DefaultSubscriptionRegistry - Unubscribe request: [Payload byte[0]][Headers={stompCommand=UNSUBSCRIBE, nativeHeaders={id=[sub-0]}, simpMessageType=UNSUBSCRIBE, simpSubscriptionId=sub-0, simpSessionId=0k__f7fv, id=a0b03cb6-0766-4173-7c86-e1281aec364d, timestamp=1395158318928}]
16:58:47 [http-bio-80-exec-18] StompDecoder - Decoded [Payload byte[0]][Headers={stompCommand=SUBSCRIBE, nativeHeaders={id=[sub-1], destination=[/stats/0a6ijX2dX5VdRQAQFskZw!]}, simpMessageType=SUBSCRIBE, simpSubscriptionId=sub-1, simpDestination=/stats/0a6ijX2dX5VdRQAQFskZw!, id=8a4b75b2-be41-e4e9-73ef-1d20833a48e2, timestamp=1395158327829}]
16:58:47 [clientInboundChannel-1] SimpAnnotationMethodMessageHandler - Handling message, lookupDestination=/stats/0a6ijX2dX5VdRQAQFskZw!
16:58:47 [clientInboundChannel-1] SimpAnnotationMethodMessageHandler - No matching method found
16:58:47 [clientInboundChannel-1] DefaultSubscriptionRegistry - Adding subscription id=sub-1, destination=/stats/0a6ijX2dX5VdRQAQFskZw!
16:58:53 [brokerChannel-1] SimpleBrokerMessageHandler - Sending message with destination=/stats/0a6ijX2dX5VdRQAQFskZw! to 1 subscriber(s)

16:58:53 [clientOutboundChannel-1] StompEncoder - Encoded STOMP command=MESSAGE headers={content-type=[application/json;charset=UTF-8], subscription=[sub-1], message-id=[0k__f7fv-0], destination=[/stats/0a6ijX2dX5VdRQAQFskZw!]}
16:59:12 [http-bio-80-exec-14] StompDecoder - Decoded [Payload byte[0]][Headers={stompCommand=CONNECT, stompCredentials=[PROTECTED], nativeHeaders={heart-beat=[10000,10000], passcode=[PROTECTED], login=[tpy], accept-version=[1.1,1.0]}, simpMessageType=CONNECT, id=89ef939d-18c7-c11f-9606-76720488fe35, timestamp=1395158352177}]
16:59:12 [clientOutboundChannel-1] StompEncoder - Encoded STOMP command=CONNECTED headers={user-name=[arnaud], heart-beat=[0,0], version=[1.1]}
16:59:13 [http-bio-80-exec-14] StompDecoder - Decoded [Payload byte[0]][Headers={stompCommand=SUBSCRIBE, nativeHeaders={id=[sub-0], destination=[/stats/0a6ijX2dX5VdRQAQFskZw!]}, simpMessageType=SUBSCRIBE, simpSubscriptionId=sub-0, simpDestination=/stats/0a6ijX2dX5VdRQAQFskZw!, id=5b3fb43b-9e9b-6806-3a1f-b293981e0645, timestamp=1395158353973}]
16:59:13 [clientInboundChannel-1] SimpAnnotationMethodMessageHandler - Handling message, lookupDestination=/stats/0a6ijX2dX5VdRQAQFskZw!
16:59:13 [clientInboundChannel-1] SimpAnnotationMethodMessageHandler - No matching method found
16:59:13 [clientInboundChannel-1] DefaultSubscriptionRegistry - Adding subscription id=sub-0, destination=/stats/0a6ijX2dX5VdRQAQFskZw!
16:59:45 [brokerChannel-1] SimpleBrokerMessageHandler - Sending message with destination=/stats/0a6ijX2dX5VdRQAQFskZw! to 2 subscriber(s)
16:59:45 [clientOutboundChannel-1] StompEncoder - Encoded STOMP command=MESSAGE headers={content-type=[application/json;charset=UTF-8], subscription=[sub-1], message-id=[0k__f7fv-8], destination=[/stats/0a6ijX2dX5VdRQAQFskZw!]}
17:00:57 [brokerChannel-1] SimpleBrokerMessageHandler - Sending message with destination=/stats/0a6ijX2dX5VdRQAQFskZw! to 2 subscriber(s)
17:00:58 [brokerChannel-1] SimpleBrokerMessageHandler - Sending message with destination=/stats/0a6ijX2dX5VdRQAQFskZw! to 2 subscriber(s)
17:00:58 [brokerChannel-1] SimpleBrokerMessageHandler - Sending message with destination=/stats/0a6ijX2dX5VdRQAQFskZw! to 2 subscriber(s)
17:00:58 [brokerChannel-1] SimpleBrokerMessageHandler - Sending message with destination=/stats/0a6ijX2dX5VdRQAQFskZw! to 2 subscriber(s)

At this step i lose the connection cause of mobile network so i reconnect (two mobiles)

17:01:04 [http-bio-80-exec-10] StompDecoder - Decoded [Payload byte[0]][Headers={stompCommand=CONNECT, stompCredentials=[PROTECTED], nativeHeaders={heart-beat=[10000,10000], passcode=[PROTECTED], login=[tpy], accept-version=[1.1,1.0]}, simpMessageType=CONNECT, id=5cf09b3e-a869-df34-1659-2e9dbf48c6d3, timestamp=1395158464874}]
17:01:56 [http-bio-80-exec-19] StompDecoder - Decoded [Payload byte[0]][Headers={stompCommand=CONNECT, stompCredentials=[PROTECTED], nativeHeaders={heart-beat=[10000,10000], passcode=[PROTECTED], login=[tpy], accept-version=[1.1,1.0]}, simpMessageType=CONNECT, id=07dbfeb3-02f7-2103-4fa0-b497f376d502, timestamp=1395158516679}]

as you can see no stomp encoder to send the response

the broker continue to send broadcast to zombie subscriber

17:03:55 [brokerChannel-1] SimpleBrokerMessageHandler - Sending message with destination=/stats/0a6ijX2dX5VdRQAQFskZw! to 2 subscriber(s)
17:03:55 [brokerChannel-1] SimpleBrokerMessageHandler - Sending message with destination=/stats/0a6ijX2dX5VdRQAQFskZw! to 2 subscriber(s)

At this step the broker detect that the connection is broken

17:14:58 [clientOutboundChannel-1] WebSocketServerSockJsSession - Terminating connection after failure to send message: java.util.concurrent.ExecutionException: java.net.SocketException: Broken pipe
17:14:58 [clientInboundChannel-1] DefaultSubscriptionRegistry - Unregistering subscriptions for sessionId=0k__f7fv
17:14:58 [clientOutboundChannel-1] StompEncoder - Encoded STOMP command=ERROR headers={message=[Failed to write SockJsFrame content='a["MESSAGE\ncontent-type:application/json;charset=UTF-8\nsubscription:sub-1\nmes...(truncated)'; nested exception is java.io.IOException: java.util.concurrent.ExecutionException: java.net.SocketException: Broken pipe]}
17:14:58 [clientOutboundChannel-1] StompEncoder - Encoded STOMP command=MESSAGE headers={content-type=[application/json;charset=UTF-8], subscription=[sub-0], message-id=[g7wir6l7-29], destination=[/stats/0a6ijX2dX5VdRQAQFskZw!]}
17:15:20 [brokerChannel-1] SimpleBrokerMessageHandler - Sending message with destination=/stats/0a6ijX2dX5VdRQAQFskZw! to 1 subscriber(s)

At this step the broker detect that the second connection is broken

17:21:12 [clientOutboundChannel-1] XhrStreamingTransportHandler$XhrStreamingSockJsSession - Terminating connection after failure to send message: null
17:21:12 [clientInboundChannel-1] DefaultSubscriptionRegistry - Unregistering subscriptions for sessionId=g7wir6l7
17:21:12 [clientOutboundChannel-1] StompEncoder - Encoded STOMP command=ERROR headers={message=[Failed to write SockJsFrame content='a["MESSAGE\ncontent-type:application/json;charset=UTF-8\nsubscription:sub-0\nmes...(truncated)'; nested exception is ClientAbortException:  java.net.SocketException: Connection timed out]}

and now there is no subscriber into the broker

17:21:12 [clientOutboundChannel-1] SubProtocolWebSocketHandler - Session not found for session with id 0k__f7fv
17:21:12 [clientOutboundChannel-1] SubProtocolWebSocketHandler - Session not found for session with id g7wir6l7
17:21:12 [clientOutboundChannel-1] SubProtocolWebSocketHandler - Session not found for session with id 0k__f7fv

and now the encoder respond to the connect demand some minutes after the ask

17:21:12 [clientOutboundChannel-1] StompEncoder - Encoded STOMP command=CONNECTED headers={heart-beat=[0,0], version=[1.1]}

it 's realy strange if you want some precision i will happy to respond

rstoyanchev commented 10 years ago

When the WebSocket transport is used, disconnects are detected immediately. For HTTP based transports however, we use the Serlvet HTTP async support and unfortunately that doesn't provide a callback when a client disconnects (See SERVLET_SPEC-44). Typically in those cases the disconnect is detected the next time we try to write.

Now the SockJS layer sends a heartbeat every 25 seconds by default to keep proxies believing the request is alive. So usually we'll find out a client has gone away within that time period. You could configure the SockJS heartbeat to be less than 25 seconds, which will effectively force the issue even sooner.

sdnetwork commented 10 years ago

in this case it's a websocket transport.

And as we can see in the log the detection of the deconnection comes about 15 minutes after the real deconnection. (it is strange because effectively the server send every 25s a heartbeat)

but the main the problem is that once the problems occurs nobody can connect to the stomp server because this one no longer responds to any request for connection :(

have you an idea of what's happen ?

rstoyanchev commented 10 years ago

in this case it's a websocket transport.

The second one appears to be HTTP streaming:

17:21:12 [clientOutboundChannel-1] XhrStreamingTransportHandler$XhrStreamingSockJsSession - Terminating connection after failure to send message: null

From the log snippets it looks like there are 2 connected users only. Is that really so or is there more parallel activity going on?

Are there other log messages after the decoding of the CONNECT. In particular any messages from the clientInboundChannel?

The log indicates that you may not be configuring the thread pools backing the clientInboundChannel and the clientOutboundChannel (which may cause this). They really should be (for example). On this point we are planning to change the default (see SPR-11556).

Is this something reproducible?

sdnetwork commented 10 years ago

From the log snippets it looks like there are 2 connected users only. Is that really so or is there more parallel activity going on?

effectively there is only two connected users in this exemple and they are realy disconnected at 17:01:04. after i try to reconnect two times with different users, but at this step i can try to connect thousand of times i get always the same result 1) websocket established 2) connect frame send 3) waiting .... (only receive heartbeat from server)

Are there other log messages after the decoding of the CONNECT. In particular any messages from the clientInboundChannel?

no only this two logs 15 and 21 minute after the connect demand

17:14:58 [clientOutboundChannel-1] WebSocketServerSockJsSession - Terminating connection      after failure to send message: java.util.concurrent.ExecutionException: java.net.SocketException:   Broken pipe
17:14:58 [clientInboundChannel-1] DefaultSubscriptionRegistry - Unregistering subscriptions for sessionId=0k__f7fv

17:21:12 [clientOutboundChannel-1] XhrStreamingTransportHandler$XhrStreamingSockJsSession - Terminating connection after failure to send message: null
17:21:12 [clientInboundChannel-1] DefaultSubscriptionRegistry - Unregistering subscriptions for sessionId=g7wir6l7

if you want i can send you the entire log of the problem (only messging log)

Is this something reproducible?

yes i can reproduce with one mobile and i block the entire server. if you want i can send you in private the url to constate the problem it's a webapp. or i can activate some log if you want it looks like that something is blocked in the server until these zombie connection is detected as dead by the server

rstoyanchev commented 10 years ago

Sure, send me the log. You can get my email via git log.

If it is something reproducible, is there a set of steps I can take with the portfolio sample then to get it to occur?

sdnetwork commented 10 years ago

I have search to reproduce the problem anytime and i think i understand in witch case the problem occurs.

procedure to block the server

connect with my mobile on stomp endpoint and subscribe to a channel switch my mobile from wifi to unstable mobile network the mobile loose the connection but the server don't see it.

in this case the server is blocked (as described before) until it see that the connection is loose (i send you the entire log of this procedure by email)

but if do the same procedure without subscribe to a channel the server don't block !

i have noticed that if the server have to write a message to a zombie channel it seems blocked.

If it is something reproducible, is there a set of steps I can take with the portfolio sample then to get it to occur?

i don't try with the porfolio application, but if you want i can reproduce the problem on the server and you can connect on the webapp to observe the websocket.

rstoyanchev commented 10 years ago

It looks like what happens is the clientOutboundChannel, which uses a thread pool most likely gets stuck trying to write to the socket. You can confirm this by doing a thread dump while it's stuck (jps -v to get the process id and then jstack -l <pid>).

You can see in the log how it's writing messages to two sessions at a time, and then suddenly, it has only written to the first session (or rather it gets stuck writing to the first session), and after only broker log messages but no actual writing:

17:00:57 [clientOutboundChannel-1] StompEncoder - Encoded STOMP command=MESSAGE headers={content-type=[application/json;charset=UTF-8], subscription=[sub-1], message-id=[0k__f7fv-24], destination=[/stats/0a6ijX2dX5VdRQAQFskZw!]}
17:00:57 [clientOutboundChannel-1] StompEncoder - Encoded STOMP command=MESSAGE headers={content-type=[application/json;charset=UTF-8], subscription=[sub-0], message-id=[g7wir6l7-25], destination=[/stats/0a6ijX2dX5VdRQAQFskZw!]}
17:00:57 [brokerChannel-1] SimpleBrokerMessageHandler - Sending message with destination=/stats/0a6ijX2dX5VdRQAQFskZw! to 2 subscriber(s)
17:00:57 [clientOutboundChannel-1] StompEncoder - Encoded STOMP command=MESSAGE headers={content-type=[application/json;charset=UTF-8], subscription=[sub-1], message-id=[0k__f7fv-26], destination=[/stats/0a6ijX2dX5VdRQAQFskZw!]}
17:00:57 [clientOutboundChannel-1] StompEncoder - Encoded STOMP command=MESSAGE headers={content-type=[application/json;charset=UTF-8], subscription=[sub-0], message-id=[g7wir6l7-27], destination=[/stats/0a6ijX2dX5VdRQAQFskZw!]}
17:00:57 [brokerChannel-1] SimpleBrokerMessageHandler - Sending message with destination=/stats/0a6ijX2dX5VdRQAQFskZw! to 2 subscriber(s)
17:00:57 [clientOutboundChannel-1] StompEncoder - Encoded STOMP command=MESSAGE headers={content-type=[application/json;charset=UTF-8], subscription=[sub-1], message-id=[0k__f7fv-28], destination=[/stats/0a6ijX2dX5VdRQAQFskZw!]}
17:00:57 [brokerChannel-1] SimpleBrokerMessageHandler - Sending message with destination=/stats/0a6ijX2dX5VdRQAQFskZw! to 2 subscriber(s)
17:00:58 [brokerChannel-1] SimpleBrokerMessageHandler - Sending message with destination=/stats/0a6ijX2dX5VdRQAQFskZw! to 2 subscriber(s)
17:00:58 [brokerChannel-1] SimpleBrokerMessageHandler - Sending message with destination=/stats/0a6ijX2dX5VdRQAQFskZw! to 2 subscriber(s)
17:00:58 [brokerChannel-1] SimpleBrokerMessageHandler - Sending message with destination=/stats/0a6ijX2dX5VdRQAQFskZw! to 2 subscriber(s)

Then when the write finally fails and the session is removed, you see a whole bunch of messages indicating write activity. Those are all the messages that were queued up in the thread pool while the bad session was stuck writing.

So the key question is how have you configured your clientInbound- and clientOutboundChannel's? The outbound channel should allow more threads. It seems to allow only 1 somehow. It allowed more than 1 thread, this wouldn't occur with just two sessions.

That said if this happened with a lot of users, you'll eventually end up with a lot of threads stuck. That's why I suggest getting the thread dump and then going to the WebSocket server mailing list or issuer tracker to understand is this expected behavior or a bug ?

Please do provide references here as you find out more for others to see. Thanks!

sdnetwork commented 10 years ago

So the key question is how have you configured your clientInbound- and clientOutboundChannel's? The outbound channel should allow more threads. It seems to allow only 1 somehow. It allowed more than 1 thread, this wouldn't occur with just two sessions.

not for the moment but i'm going to try but as i don't understand the problem i think with more users i encountered the same problem later.

maybe is an important information i can't reproduce the problem on a windows server with the same tomcat7 server (7.0.52)

rstoyanchev commented 10 years ago

not for the moment but i'm going to try but as i don't understand the problem i think with more users i encountered the same problem later.

How about just show your configuration, i.e. the class with @EnableWebSocketMessageBroker.

rstoyanchev commented 10 years ago

maybe is an important information i can't reproduce the problem on a windows server with the same tomcat7 server (7.0.52)

I have no doubt this is a question worth raising on the Tomcat user list. The thread dump shows clearly Tomcat is writing to a socket and from the log it looks like the stuck connection takes close to 15 minutes to raise a SocketException. That seems unusual to me. I will ask the question there.

rstoyanchev commented 10 years ago

Cross-reference to the question I posted on Tomcat's user-list.

sdnetwork commented 10 years ago

Thanks for the message to the tomcat list !

i have configured the clientoutboundchannel like that :

@Override
public void configureClientOutboundChannel(ChannelRegistration channel)

{ // TODO Auto-generated method stub channel.taskExecutor().corePoolSize(4).maxPoolSize(4).queueCapacity(1000); }

but unfortunately the problems is the same :( (i have attached a dump, it seems it blocks at the same point), but i expected to see 4 thread in this state no ?

yet i think i have understand a thing, there is a link between the quantity of data write to the subscriber.

for exemple if no message is send to the subscribers i can't block the server (it coulb be explain why on my developpement server i can't reproduce), although there are a lot of zombie connection.

but if big message is send regulary on subscribers (true and zombie), here it blocks very quicly ! If there are not zombie connection no problem (with hundred connecion).

it looks like a buffer that is full and blocks the write by clientoutboundchannel.

an other thing in my web.config.xml i have increase this parameter (since the begining):

org.apache.tomcat.websocket.textBufferSize 256000

maybe all is linked. but i'm quite sure there is a link between the amount of data send to the zombie subscribers and the problems.

2014-03-20 3:23 GMT+01:00 Rossen Stoyanchev notifications@github.com:

Cross-reference http://markmail.org/message/fjqpcq2clcq6kg4p to the question I posted on Tomcat's user-list.

Reply to this email directly or view it on GitHubhttps://github.com/rstoyanchev/spring-websocket-portfolio/issues/31#issuecomment-38129474 .

rstoyanchev commented 10 years ago

I've created https://jira.spring.io/browse/SPR-11586 by the way.