rabbitmq / rabbitmq-stomp

RabbitMQ STOMP plugin
https://www.rabbitmq.com/stomp.html
Other
49 stars 28 forks source link

a lot of cryptic error reports on busy rabbitmq serving stomp connections #56

Closed databus23 closed 8 years ago

databus23 commented 8 years ago

I'm trying to understand error messages I'm seeing in the logs of a single node rabbitmq server that is solely used by stomp clients. The rabbitmq server runs on rhel6 with version 3.6.0 and erlang 17.5. There are 16k stomp clients (mcollective/stomp gem 1.2.16) connected to the server. I'm seeing a high rate of error reports of the following type:

=ERROR REPORT==== 3-Feb-2016::10:31:00 ===
** Generic server <0.582.70> terminating
** Last message in was {'EXIT',<0.574.70>,
                        {function_clause,
                         [{rabbit_stomp_reader,log_reason,
                           [{stomp_unexpected_cast,client_timeout},
                            {reader_state,#Port<0.430455>,
[...]
** Reason for termination ==
** {function_clause,
       [{rabbit_stomp_reader,log_reason,
            [{stomp_unexpected_cast,client_timeout},
[...]

Full error: https://gist.github.com/databus23/20459d571ad5922ab84a

I'm still trying to find a way to reproduce the error reliably so that I can get a tcpdump of the connection recorded but no luck so far.

I'm hoping somebody could give me a hint what this error means and what might be the problem.

In the meantime I'm trying to gather further information.

michaelklishin commented 8 years ago

This looks like a missed client heartbeat that isn't logged properly. Do your STOMP clients use heartbeats?

databus23 commented 8 years ago

yes they do. The heartbeat interval is set to 60 seconds in the ruby stomp gem.

databus23 commented 8 years ago

@michaelklishin I think you assumption might be right. I reverted to 3.5.7 and now I see single lines like the following in t lohegs instead of those traces

STOMP detected missed client heartbeat(s) on connection 172.20.224.162:47636 -> 147.204.8.73:61613, closing it
michaelklishin commented 8 years ago

This is a duplicate of #63, which is fixed in https://github.com/rabbitmq/rabbitmq-stomp/pull/65 and has made it into 3.6.1 (which will be announced tomorrow).