jhalterman / lyra

High availability RabbitMQ client
Apache License 2.0
263 stars 74 forks source link

Recovery fails if the machine where rabbitmq is running gets rebooted #51

Closed ashwin-os closed 9 years ago

ashwin-os commented 9 years ago

Hi,

I have the following policy configured while creating a connection :

Config config = new Config().withRecoveryPolicy(RecoveryPolicies.recoverAlways().withInterval(Duration.seconds(5))); connection = Connections.create(options, config);

Recovery works when the rabbitmq is stopped and restarted. However when the machine where rabbitmq is running is shutdown and restarted, the recovery fails and never reconnects.

2015-07-14 15:22:10.417 [AMQP Connection 10.2.15.155:5672] [] [] [] [] [] [ShutdownListener:25] [ERROR] connection error; reason: java.net.SocketException: Connection reset com.rabbitmq.client.ShutdownSignalException: connection error; reason: java.net.SocketException: Connection reset at com.rabbitmq.client.impl.AMQConnection.startShutdown(AMQConnection.java:678) at com.rabbitmq.client.impl.AMQConnection.shutdown(AMQConnection.java:668) at com.rabbitmq.client.impl.AMQConnection$MainLoop.run(AMQConnection.java:550) Caused by: java.net.SocketException: Connection reset at java.net.SocketInputStream.read(SocketInputStream.java:168) at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at java.io.BufferedInputStream.read(BufferedInputStream.java:237) at java.io.DataInputStream.readUnsignedByte(DataInputStream.java:271) at com.rabbitmq.client.impl.Frame.readFrom(Frame.java:95) at com.rabbitmq.client.impl.SocketFrameHandler.readFrame(SocketFrameHandler.java:131) at com.rabbitmq.client.impl.AMQConnection$MainLoop.run(AMQConnection.java:515)

2015-07-14 15:20:46.853 [AMQP Connection 10.2.15.155:5672] [] [] [] [] [] [ShutdownListener:25] [ERROR] connection error ; reason: com.rabbitmq.client.MissedHeartbeatException: Heartbeat missing with heartbeat = 580 seconds com.rabbitmq.client.ShutdownSignalException: connection error; reason: com.rabbitmq.client.MissedHeartbeatException: Heartbeat missing with heart beat = 580 seconds at com.rabbitmq.client.impl.AMQConnection.startShutdown(AMQConnection.java:678) at com.rabbitmq.client.impl.AMQConnection.shutdown(AMQConnection.java:668) at com.rabbitmq.client.impl.AMQConnection$MainLoop.run(AMQConnection.java:550) Caused by: com.rabbitmq.client.MissedHeartbeatException: Heartbeat missing with heartbeat = 580 seconds at com.rabbitmq.client.impl.AMQConnection.handleSocketTimeout(AMQConnection.java:578) at com.rabbitmq.client.impl.AMQConnection.access$500(AMQConnection.java:59) at com.rabbitmq.client.impl.AMQConnection$MainLoop.run(AMQConnection.java:541)

Please help

jhalterman commented 9 years ago

So just to be clear, you're rebooting the machine or just the restarting rmq (rabbitmqctl stop or stop_app)? If rebooting, how is rmq started again when the machine comes back up?

ashwin-os commented 9 years ago

yes. I am rebooting the machine and restarting rabbitmq manually from command line. I notice that it fails recovery because of "No Route to Host" exception and adding NoRouteToHostException.class in RECURRING_EXCEPTIONS in Config.java solve the problem

jhalterman commented 9 years ago

I can't explain why your setup is throwing NoRouteToHostException after a rmq restart - perhaps the folks on the rabbitmq mailing list could offer some insight, but generally NoRouteToHostException is not a recoverable error which is why Lyra fails. If it is recoverable in your case, as it seems to be, you can simply add this as a recoverable exception via:

config.getRecoverableExceptions().add(NoRouteToHostException.class);

No code changes required :)

ashwin-os commented 9 years ago

thank you :+1:

michaelklishin commented 9 years ago

NoRouteToHostException is routing or hostname resolution exception. RabbitMQ cannot affect either in any way.