onebeyond / rascal

A config driven wrapper for amqp.node supporting multi-host connections, automatic error recovery, redelivery flood protection, transparent encryption / decryption and channel pooling.
MIT License
451 stars 69 forks source link

vhost keeps reconnecting after broker.shutdown() #158

Closed murbanowicz closed 3 years ago

murbanowicz commented 3 years ago

Hi,

We have discovered an issue with the library.

  1. Create an app with the broker.
  2. Disconnect AMQP container from the Docker network, so the host becomes unavailable.
  3. On the 'error' event shut down the broker, in our case on heartbeat error (common industry approach to shut down the app in similar cases).
  4. Broker theoretically is shut down, but vhost keeps trying to reconnect.

Logs:

2021-06-28T11:23:19.647Z rascal:Vhost Handling connection error: Heartbeat timeout initially from connection: 420e4da1-3ab4-44ec-8071-ba68dde266bd, amqp://rabbit
2021-06-28T11:23:19.648Z rascal:Vhost Initialising vhost: amqp
2021-06-28T11:23:19.649Z rascal:tasks:createConnection Connecting to broker using url: amqp://rabbit
2021-06-28T11:23:19.653Z rascal:Subscription Handling channel error: Heartbeat timeout from amqp/MyQueue using channel: 5f1f5d4d-43db-457a-b35b-072387d544e9
2021-06-28T11:23:19.658Z rascal:SubscriberSession Removing channel: 5f1f5d4d-43db-457a-b35b-072387d544e9 from session
2021-06-28T11:23:19.676Z rascal:Broker Shutting down broker
2021-06-28T11:23:19.677Z rascal:Vhost Forwarning vhost: amqp about impending shutdown
2021-06-28T11:23:19.680Z rascal:Subscription Subscribing to queue: DecodedAzureMessagesQueue
2021-06-28T11:23:19.680Z rascal:Vhost Requested channel. Outstanding channel requests: 1
2021-06-28T11:23:19.681Z rascal:Vhost Ignoring create channel request. VHost is shutting down.
2021-06-28T11:23:19.682Z rascal:SubscriberSession No current subscriber session
2021-06-28T11:23:19.682Z rascal:Broker Waiting 10000ms for all subscriber channels to close
2021-06-28T11:23:19.770Z rascal:Vhost Will attempt reconnection in in 1248ms
2021-06-28T11:23:21.020Z rascal:Vhost Handling connection error: Failed to connect to: amqp://rabbit. Original message was: getaddrinfo ENOTFOUND rabbit initially from connection: 420e4da1-3ab4-44ec-8071-ba68dde266bd, amqp://rabbit
2021-06-28T11:23:21.023Z rascal:Vhost Initialising vhost: amqp
2021-06-28T11:23:21.023Z rascal:tasks:createConnection Connecting to broker using url: amqp://rabbit
2021-06-28T11:23:21.035Z rascal:Vhost Will attempt reconnection in in 2109ms
2021-06-28T11:23:23.147Z rascal:Vhost Handling connection error: Failed to connect to: amqp://rabbit. Original message was: getaddrinfo ENOTFOUND rabbit initially from connection: 420e4da1-3ab4-44ec-8071-ba68dde266bd, amqp://rabbit
2021-06-28T11:23:23.147Z rascal:Vhost Initialising vhost: amqp
2021-06-28T11:23:23.148Z rascal:tasks:createConnection Connecting to broker using url: amqp://rabbit
2021-06-28T11:23:23.158Z rascal:Vhost Will attempt reconnection in in 6188ms
2021-06-28T11:23:29.348Z rascal:Vhost Handling connection error: Failed to connect to: amqp://rabbit. Original message was: getaddrinfo ENOTFOUND rabbit initially from connection: 420e4da1-3ab4-44ec-8071-ba68dde266bd, amqp://rabbit
2021-06-28T11:23:29.348Z rascal:Vhost Initialising vhost: amqp
2021-06-28T11:23:29.349Z rascal:tasks:createConnection Connecting to broker using url: amqp://rabbit
2021-06-28T11:23:29.360Z rascal:Vhost Will attempt reconnection in in 13735ms
2021-06-28T11:23:29.683Z rascal:Vhost Shutting down vhost: amqp
2021-06-28T11:23:29.685Z rascal:Vhost Draining regular channel pool. { vhost: 'amqp', mode: 'regular', queue: 0, size: 0, available: 0, borrowed: 0, min: 1, max: 5 }
2021-06-28T11:23:29.687Z rascal:Vhost Drained regular channel pool. { vhost: 'amqp', mode: 'regular', queue: 0, size: 0, available: 0, borrowed: 0, min: 1, max: 5 }
2021-06-28T11:23:29.687Z rascal:Vhost Draining confirm channel pool. { vhost: 'amqp', mode: 'confirm', queue: 0, size: 0, available: 0, borrowed: 0, min: 1, max: 5 }
2021-06-28T11:23:29.688Z rascal:Vhost Drained confirm channel pool. { vhost: 'amqp', mode: 'confirm', queue: 0, size: 0, available: 0, borrowed: 0, min: 1, max: 5 }
2021-06-28T11:23:29.688Z rascal:Vhost Disconnecting from vhost: amqp
2021-06-28T11:23:29.688Z rascal:Broker Finished shutting down broker

\/ THIS SHOULD NOT HAPPEN \/
2021-06-28T11:23:43.064Z rascal:Vhost Handling connection error: Failed to connect to: amqp://rabbit. Original message was: getaddrinfo ENOTFOUND rabbit initially from connection: 420e4da1-3ab4-44ec-8071-ba68dde266bd, amqp://rabbit
2021-06-28T11:23:43.065Z rascal:Vhost Initialising vhost: amqp
2021-06-28T11:23:43.066Z rascal:tasks:createConnection Connecting to broker using url: amqp://rabbit
2021-06-28T11:23:43.079Z rascal:Vhost Will attempt reconnection in in 16329ms
2021-06-28T11:23:59.408Z rascal:Vhost Handling connection error: Failed to connect to: amqp://rabbit. Original message was: getaddrinfo ENOTFOUND rabbit initially from connection: 420e4da1-3ab4-44ec-8071-ba68dde266bd, amqp://rabbit
2021-06-28T11:23:59.409Z rascal:Vhost Initialising vhost: amqp
2021-06-28T11:23:59.409Z rascal:tasks:createConnection Connecting to broker using url: amqp://rabbit
2021-06-28T11:23:59.421Z rascal:Vhost Will attempt reconnection in in 43101ms
cressie176 commented 3 years ago

Thanks for reporting, and the clear steps to reproduce @murbanowicz. I'll look into it.

cressie176 commented 3 years ago

One thing that may help in the meantime - if you're not using Rascal's automatic connection recovery, you could disable it, then the Vhost shouldn't reconnect.

cressie176 commented 3 years ago

Hi @murbanowicz, Sorry this is taking a while. I've managed to get Rascal to stop reconnecting after shutdown easily enough, but there's a secondary issue with the channel pool. The pool never completes draining and so blocks the shutdown for a new reason. I'm trying to find a workaround.

cressie176 commented 3 years ago

Think I've managed to figure this out. The channel pool was waiting indefinitely for a call to channel.close to yield. However, because the of the heartbeat error, the broker never replied, pool never finished draining. I've fixed in rascal this through use of a configurable destroy timeout, and volunteered to add to the timeout to the underly generic-pool library.

cressie176 commented 3 years ago

Fixed in 13.1.0. I'll push out a patch release if/when generic-pool-289 gets merged.

murbanowicz commented 3 years ago

I highly appreciate your responsiveness and effort! Thanks! 🥇