onebeyond / rascal

A config driven wrapper for amqp.node supporting multi-host connections, automatic error recovery, redelivery flood protection, transparent encryption / decryption and channel pooling.
MIT License
451 stars 69 forks source link

Question - handling connection closure with reason "shutdown" #156

Closed matej-prokop closed 3 years ago

matej-prokop commented 3 years ago

Hey @cressie176,

I have an application consuming a queue. Currently, when there is a maintenance of our RMQ cluster (and I guess that RMQ node is stopped and replaced or restarted). My application gets "CONNECTION_FORCED - broker forced connection closure with reason 'shutdown'" message and it stops consuming the queue until I manually restart the application. Within the application I initialize its subscription to queue in following way:

const broker = await Broker.create(...);

broker.on("error",
  err => logger.error("RabbitBroker (rascal) - error", err)
);

const subscription = await broker.subscribe(...);
subscription
  .on("message", (message, content, ackOrNack) => {
    ...
  })
  .on("error",
    err => logger.error("Subscription error", err)
  );

During last maintenance, it logged those two messages (emitting an error events for both subscription and broker):

ERROR Subscription error - Error: Connection closed: 320 (CONNECTION-FORCED) with message "CONNECTION_FORCED - broker forced connection closure with reason 'shutdown'"

ERROR RabbitBroker (rascal) - error - Error: Connection closed: 320 (CONNECTION-FORCED) with message "CONNECTION_FORCED - broker forced connection closure with reason 'shutdown'"

When an error event is emitted for broker or subscription is there any chance that it will recover? Can my app detect somehow that connection won't be restored/rebuild in this case? I am thinking about terminating the application and letting underlying (container) orchestrator to restart it - but I am afraid there could be other error event types where restarting could be harmful.

Thank you for making this awesome library and supporting its users!

cressie176 commented 3 years ago

Hi @matej-prokop,

With Rascal's default configuration should re-connect and resubscribe automatically. If you've defined multiple hosts in the connection configuration it will cycle through them on an incremental backoff indefinitely until one of the connections succeeds, then start consuming. Even if there's only one host it should repeatedly retry using the same logic. All you have to do is ensure you register the error listens (as you have done), to stop them crashing your application.

Can you turn on debug by setting the DEBUG environment variable to rascal:* or rascal:Vhost*. This will show you whether rascal is trying to reconnect. When I do this with rascal's simple example and restart the docker container I get the following output...

 DEBUG=rascal:Vhost* node index.js 
  rascal:Vhost Initialising vhost: / +0ms
  rascal:Vhost Creating regular channel pool { autostart: false, max: 5, min: 1, evictionRunIntervalMillis: 10000, idleTimeoutMillis: 60000, rejectionDelayMillis: 1000, testOnBorrow: true } +36ms
  rascal:Vhost Creating confirm channel pool { autostart: true, max: 5, min: 1, evictionRunIntervalMillis: 10000, idleTimeoutMillis: 60000, rejectionDelayMillis: 1000, testOnBorrow: true } +2ms
  rascal:Vhost Creating pooled confirm channel for vhost: / +0ms
  rascal:Vhost vhost: / was initialised with connection: 287a703c-78d6-4821-97de-3a710a41a9d1 +1ms
  rascal:Vhost Requested channel. Outstanding channel requests: 1 +3ms
  rascal:Vhost Created confirm channel: 0d2802cc-095b-499f-9be7-d99a294f4e9e from connection: 287a703c-78d6-4821-97de-3a710a41a9d1 +4ms
  rascal:Vhost Created confirm channel: 2ea038e5-619f-4093-9c85-1ec9b0d656a7 from connection: 287a703c-78d6-4821-97de-3a710a41a9d1 +0ms
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 0, size: 1, available: 1, borrowed: 0, min: 1, max: 5 } +998ms
  rascal:Vhost Borrowed confirm channel: 0d2802cc-095b-499f-9be7-d99a294f4e9e. { vhost: '/', mode: 'confirm', queue: 0, size: 1, available: 0, borrowed: 1, min: 1, max: 5 } +2ms
  rascal:Vhost Releasing confirm channel: 0d2802cc-095b-499f-9be7-d99a294f4e9e. { vhost: '/', mode: 'confirm', queue: 0, size: 1, available: 0, borrowed: 1, min: 1, max: 5 } +6ms
2021-06-22T05:13:59.088Z: hello world
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 0, size: 1, available: 1, borrowed: 0, min: 1, max: 5 } +991ms
  rascal:Vhost Borrowed confirm channel: 0d2802cc-095b-499f-9be7-d99a294f4e9e. { vhost: '/', mode: 'confirm', queue: 0, size: 1, available: 0, borrowed: 1, min: 1, max: 5 } +1ms
  rascal:Vhost Releasing confirm channel: 0d2802cc-095b-499f-9be7-d99a294f4e9e. { vhost: '/', mode: 'confirm', queue: 0, size: 1, available: 0, borrowed: 1, min: 1, max: 5 } +4ms
2021-06-22T05:14:00.088Z: hello world
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 0, size: 1, available: 1, borrowed: 0, min: 1, max: 5 } +997ms
  rascal:Vhost Borrowed confirm channel: 0d2802cc-095b-499f-9be7-d99a294f4e9e. { vhost: '/', mode: 'confirm', queue: 0, size: 1, available: 0, borrowed: 1, min: 1, max: 5 } +0ms
  rascal:Vhost Releasing confirm channel: 0d2802cc-095b-499f-9be7-d99a294f4e9e. { vhost: '/', mode: 'confirm', queue: 0, size: 1, available: 0, borrowed: 1, min: 1, max: 5 } +4ms
2021-06-22T05:14:01.090Z: hello world
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 0, size: 1, available: 1, borrowed: 0, min: 1, max: 5 } +999ms
  rascal:Vhost Borrowed confirm channel: 0d2802cc-095b-499f-9be7-d99a294f4e9e. { vhost: '/', mode: 'confirm', queue: 0, size: 1, available: 0, borrowed: 1, min: 1, max: 5 } +1ms
  rascal:Vhost Releasing confirm channel: 0d2802cc-095b-499f-9be7-d99a294f4e9e. { vhost: '/', mode: 'confirm', queue: 0, size: 1, available: 0, borrowed: 1, min: 1, max: 5 } +4ms
2021-06-22T05:14:02.093Z: hello world
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 0, size: 1, available: 1, borrowed: 0, min: 1, max: 5 } +1s
  rascal:Vhost Borrowed confirm channel: 0d2802cc-095b-499f-9be7-d99a294f4e9e. { vhost: '/', mode: 'confirm', queue: 0, size: 1, available: 0, borrowed: 1, min: 1, max: 5 } +1ms
  rascal:Vhost Releasing confirm channel: 0d2802cc-095b-499f-9be7-d99a294f4e9e. { vhost: '/', mode: 'confirm', queue: 0, size: 1, available: 0, borrowed: 1, min: 1, max: 5 } +3ms
2021-06-22T05:14:03.098Z: hello world
  rascal:Vhost Handling connection error: Connection closed: 320 (CONNECTION-FORCED) with message "CONNECTION_FORCED - broker forced connection closure with reason 'shutdown'" initially from connection: 287a703c-78d6-4821-97de-3a710a41a9d1, amqp://guest:***@localhost:5672?heartbeat=10&connection_timeout=10000&channelMax=100 +209ms
Error: Connection closed: 320 (CONNECTION-FORCED) with message "CONNECTION_FORCED - broker forced connection closure with reason 'shutdown'"
    at Object.accept (/Users/steve/Development/guidesmiths/rascal/node_modules/amqplib/lib/connection.js:91:15)
    at Connection.mainAccept (/Users/steve/Development/guidesmiths/rascal/node_modules/amqplib/lib/connection.js:64:33)
    at Socket.go (/Users/steve/Development/guidesmiths/rascal/node_modules/amqplib/lib/connection.js:478:48)
    at Socket.emit (events.js:315:20)
    at emitReadable_ (internal/streams/readable.js:569:12)
    at processTicksAndRejections (internal/process/task_queues.js:79:21) {
  code: 320
} {
  vhost: '/',
  connectionUrl: 'amqp://guest:***@localhost:5672?heartbeat=10&connection_timeout=10000&channelMax=100'
}
  rascal:Vhost Initialising vhost: / +1ms
Error: Connection closed: 320 (CONNECTION-FORCED) with message "CONNECTION_FORCED - broker forced connection closure with reason 'shutdown'"
    at Object.accept (/Users/steve/Development/guidesmiths/rascal/node_modules/amqplib/lib/connection.js:91:15)
    at Connection.mainAccept (/Users/steve/Development/guidesmiths/rascal/node_modules/amqplib/lib/connection.js:64:33)
    at Socket.go (/Users/steve/Development/guidesmiths/rascal/node_modules/amqplib/lib/connection.js:478:48)
    at Socket.emit (events.js:315:20)
    at emitReadable_ (internal/streams/readable.js:569:12)
    at processTicksAndRejections (internal/process/task_queues.js:79:21) {
  code: 320
}
  rascal:Vhost Requested channel. Outstanding channel requests: 1 +2ms
  rascal:Vhost Will attempt reconnection in in 1100ms +2ms
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 0, size: 1, available: 1, borrowed: 0, min: 1, max: 5 } +787ms
  rascal:Vhost Handling connection error: Failed to connect to: amqp://guest:***@localhost:5672?heartbeat=10&connection_timeout=10000&channelMax=100. Original message was: Socket closed abruptly during opening handshake initially from connection: 287a703c-78d6-4821-97de-3a710a41a9d1, amqp://guest:***@localhost:5672?heartbeat=10&connection_timeout=10000&channelMax=100 +317ms
Error: Failed to connect to: amqp://guest:***@localhost:5672?heartbeat=10&connection_timeout=10000&channelMax=100. Original message was: Socket closed abruptly during opening handshake
    at Socket.endWhileOpening (/Users/steve/Development/guidesmiths/rascal/node_modules/amqplib/lib/connection.js:260:17)
    at Socket.emit (events.js:327:22)
    at endReadableNT (internal/streams/readable.js:1327:12)
    at processTicksAndRejections (internal/process/task_queues.js:80:21) {
  vhost: '/',
  connectionUrl: 'amqp://guest:***@localhost:5672?heartbeat=10&connection_timeout=10000&channelMax=100'
}
  rascal:Vhost Initialising vhost: / +1ms
  rascal:Vhost Will attempt reconnection in in 3681ms +2ms
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 1, size: 1, available: 1, borrowed: 0, min: 1, max: 5 } +684ms
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 2, size: 1, available: 1, borrowed: 0, min: 1, max: 5 } +1s
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 3, size: 1, available: 1, borrowed: 0, min: 1, max: 5 } +1s
  rascal:Vhost Handling connection error: Failed to connect to: amqp://guest:***@localhost:5672?heartbeat=10&connection_timeout=10000&channelMax=100. Original message was: Socket closed abruptly during opening handshake initially from connection: 287a703c-78d6-4821-97de-3a710a41a9d1, amqp://guest:***@localhost:5672?heartbeat=10&connection_timeout=10000&channelMax=100 +994ms
Error: Failed to connect to: amqp://guest:***@localhost:5672?heartbeat=10&connection_timeout=10000&channelMax=100. Original message was: Socket closed abruptly during opening handshake
    at Socket.endWhileOpening (/Users/steve/Development/guidesmiths/rascal/node_modules/amqplib/lib/connection.js:260:17)
    at Socket.emit (events.js:327:22)
    at endReadableNT (internal/streams/readable.js:1327:12)
    at processTicksAndRejections (internal/process/task_queues.js:80:21) {
  vhost: '/',
  connectionUrl: 'amqp://guest:***@localhost:5672?heartbeat=10&connection_timeout=10000&channelMax=100'
}
  rascal:Vhost Initialising vhost: / +1ms
  rascal:Vhost Will attempt reconnection in in 6611ms +2ms
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 4, size: 1, available: 1, borrowed: 0, min: 1, max: 5 } +3ms
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 5, size: 1, available: 1, borrowed: 0, min: 1, max: 5 } +1s
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 6, size: 1, available: 1, borrowed: 0, min: 1, max: 5 } +1s
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 7, size: 1, available: 1, borrowed: 0, min: 1, max: 5 } +1s
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 8, size: 1, available: 1, borrowed: 0, min: 1, max: 5 } +1s
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 9, size: 1, available: 1, borrowed: 0, min: 1, max: 5 } +1s
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 10, size: 1, available: 1, borrowed: 0, min: 1, max: 5 } +1s
  rascal:Vhost Handling connection error: Failed to connect to: amqp://guest:***@localhost:5672?heartbeat=10&connection_timeout=10000&channelMax=100. Original message was: connect ECONNREFUSED 127.0.0.1:5672 initially from connection: 287a703c-78d6-4821-97de-3a710a41a9d1, amqp://guest:***@localhost:5672?heartbeat=10&connection_timeout=10000&channelMax=100 +595ms
Error: connect ECONNREFUSED 127.0.0.1:5672
    at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1146:16) {
  errno: -61,
  code: 'ECONNREFUSED',
  syscall: 'connect',
  address: '127.0.0.1',
  port: 5672
} {
  vhost: '/',
  connectionUrl: 'amqp://guest:***@localhost:5672?heartbeat=10&connection_timeout=10000&channelMax=100'
}
  rascal:Vhost Initialising vhost: / +1ms
  rascal:Vhost Will attempt reconnection in in 12747ms +4ms
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 11, size: 1, available: 1, borrowed: 0, min: 1, max: 5 } +401ms
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 12, size: 1, available: 1, borrowed: 0, min: 1, max: 5 } +1s
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 13, size: 1, available: 1, borrowed: 0, min: 1, max: 5 } +1s
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 14, size: 1, available: 1, borrowed: 0, min: 1, max: 5 } +1s
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 15, size: 1, available: 1, borrowed: 0, min: 1, max: 5 } +1s
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 16, size: 1, available: 1, borrowed: 0, min: 1, max: 5 } +1s
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 17, size: 1, available: 1, borrowed: 0, min: 1, max: 5 } +1s
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 18, size: 1, available: 1, borrowed: 0, min: 1, max: 5 } +1s
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 19, size: 1, available: 1, borrowed: 0, min: 1, max: 5 } +1s
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 20, size: 1, available: 1, borrowed: 0, min: 1, max: 5 } +1s
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 21, size: 1, available: 1, borrowed: 0, min: 1, max: 5 } +1s
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 22, size: 1, available: 1, borrowed: 0, min: 1, max: 5 } +1s
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 23, size: 1, available: 1, borrowed: 0, min: 1, max: 5 } +1s
  rascal:Vhost Handling connection error: Failed to connect to: amqp://guest:***@localhost:5672?heartbeat=10&connection_timeout=10000&channelMax=100. Original message was: Socket closed abruptly during opening handshake initially from connection: 287a703c-78d6-4821-97de-3a710a41a9d1, amqp://guest:***@localhost:5672?heartbeat=10&connection_timeout=10000&channelMax=100 +321ms
Error: Failed to connect to: amqp://guest:***@localhost:5672?heartbeat=10&connection_timeout=10000&channelMax=100. Original message was: Socket closed abruptly during opening handshake
    at Socket.endWhileOpening (/Users/steve/Development/guidesmiths/rascal/node_modules/amqplib/lib/connection.js:260:17)
    at Socket.emit (events.js:327:22)
    at endReadableNT (internal/streams/readable.js:1327:12)
    at processTicksAndRejections (internal/process/task_queues.js:80:21) {
  vhost: '/',
  connectionUrl: 'amqp://guest:***@localhost:5672?heartbeat=10&connection_timeout=10000&channelMax=100'
}
  rascal:Vhost Initialising vhost: / +1ms
  rascal:Vhost vhost: / was initialised with connection: 55736839-b556-4105-90a6-4d3b819ad04d +19ms
  rascal:Vhost Destroying channel: 0d2802cc-095b-499f-9be7-d99a294f4e9e +1ms
  rascal:Vhost Creating pooled confirm channel for vhost: / +0ms
  rascal:Vhost Created confirm channel: 2fd3e6d3-9138-4bf8-85cb-182b3d6d4486 from connection: 55736839-b556-4105-90a6-4d3b819ad04d +5ms
  rascal:Vhost Created confirm channel: 33cc0b58-2775-493d-b61d-8adbcddee85b from connection: 55736839-b556-4105-90a6-4d3b819ad04d +1ms
  rascal:Vhost Borrowed confirm channel: 33cc0b58-2775-493d-b61d-8adbcddee85b. { vhost: '/', mode: 'confirm', queue: 23, size: 1, available: 0, borrowed: 1, min: 1, max: 5 } +0ms
  rascal:Vhost Creating pooled confirm channel for vhost: / +0ms
  rascal:Vhost Releasing confirm channel: 33cc0b58-2775-493d-b61d-8adbcddee85b. { vhost: '/', mode: 'confirm', queue: 22, size: 2, available: 0, borrowed: 1, min: 1, max: 5 } +3ms
  rascal:Vhost Borrowed confirm channel: 33cc0b58-2775-493d-b61d-8adbcddee85b. { vhost: '/', mode: 'confirm', queue: 22, size: 2, available: 0, borrowed: 1, min: 1, max: 5 } +0ms
2021-06-22T05:14:04.103Z: hello world
  rascal:Vhost Created confirm channel: c2b28680-5088-4d9a-aab9-71be7d0ec322 from connection: 55736839-b556-4105-90a6-4d3b819ad04d +2ms
  rascal:Vhost Releasing confirm channel: 33cc0b58-2775-493d-b61d-8adbcddee85b. { vhost: '/', mode: 'confirm', queue: 21, size: 2, available: 0, borrowed: 1, min: 1, max: 5 } +0ms
2021-06-22T05:14:05.106Z: hello world
  rascal:Vhost Borrowed confirm channel: 33cc0b58-2775-493d-b61d-8adbcddee85b. { vhost: '/', mode: 'confirm', queue: 21, size: 2, available: 1, borrowed: 1, min: 1, max: 5 } +0ms
  rascal:Vhost Borrowed confirm channel: c2b28680-5088-4d9a-aab9-71be7d0ec322. { vhost: '/', mode: 'confirm', queue: 20, size: 2, available: 0, borrowed: 2, min: 1, max: 5 } +1ms
  rascal:Vhost Creating pooled confirm channel for vhost: / +1ms
  rascal:Vhost Releasing confirm channel: 33cc0b58-2775-493d-b61d-8adbcddee85b. { vhost: '/', mode: 'confirm', queue: 19, size: 3, available: 0, borrowed: 2, min: 1, max: 5 } +0ms
2021-06-22T05:14:06.109Z: hello world
  rascal:Vhost Borrowed confirm channel: 33cc0b58-2775-493d-b61d-8adbcddee85b. { vhost: '/', mode: 'confirm', queue: 19, size: 3, available: 0, borrowed: 2, min: 1, max: 5 } +1ms
  rascal:Vhost Releasing confirm channel: c2b28680-5088-4d9a-aab9-71be7d0ec322. { vhost: '/', mode: 'confirm', queue: 18, size: 3, available: 0, borrowed: 2, min: 1, max: 5 } +1ms
2021-06-22T05:14:07.114Z: hello world
  rascal:Vhost Borrowed confirm channel: c2b28680-5088-4d9a-aab9-71be7d0ec322. { vhost: '/', mode: 'confirm', queue: 18, size: 3, available: 0, borrowed: 2, min: 1, max: 5 } +0ms
  rascal:Vhost Releasing confirm channel: 33cc0b58-2775-493d-b61d-8adbcddee85b. { vhost: '/', mode: 'confirm', queue: 17, size: 3, available: 0, borrowed: 2, min: 1, max: 5 } +1ms
2021-06-22T05:14:08.114Z: hello world
  rascal:Vhost Borrowed confirm channel: 33cc0b58-2775-493d-b61d-8adbcddee85b. { vhost: '/', mode: 'confirm', queue: 17, size: 3, available: 0, borrowed: 2, min: 1, max: 5 } +2ms
  rascal:Vhost Created confirm channel: e581692f-4779-488a-a0b9-e3998ebda658 from connection: 55736839-b556-4105-90a6-4d3b819ad04d +1ms
  rascal:Vhost Releasing confirm channel: c2b28680-5088-4d9a-aab9-71be7d0ec322. { vhost: '/', mode: 'confirm', queue: 16, size: 3, available: 0, borrowed: 2, min: 1, max: 5 } +0ms
2021-06-22T05:14:09.115Z: hello world
  rascal:Vhost Borrowed confirm channel: c2b28680-5088-4d9a-aab9-71be7d0ec322. { vhost: '/', mode: 'confirm', queue: 16, size: 3, available: 1, borrowed: 2, min: 1, max: 5 } +0ms
  rascal:Vhost Borrowed confirm channel: e581692f-4779-488a-a0b9-e3998ebda658. { vhost: '/', mode: 'confirm', queue: 15, size: 3, available: 0, borrowed: 3, min: 1, max: 5 } +0ms
  rascal:Vhost Creating pooled confirm channel for vhost: / +1ms
  rascal:Vhost Releasing confirm channel: 33cc0b58-2775-493d-b61d-8adbcddee85b. { vhost: '/', mode: 'confirm', queue: 14, size: 4, available: 0, borrowed: 3, min: 1, max: 5 } +1ms
2021-06-22T05:14:10.119Z: hello world
  rascal:Vhost Borrowed confirm channel: 33cc0b58-2775-493d-b61d-8adbcddee85b. { vhost: '/', mode: 'confirm', queue: 14, size: 4, available: 0, borrowed: 3, min: 1, max: 5 } +0ms
  rascal:Vhost Releasing confirm channel: c2b28680-5088-4d9a-aab9-71be7d0ec322. { vhost: '/', mode: 'confirm', queue: 13, size: 4, available: 0, borrowed: 3, min: 1, max: 5 } +0ms
2021-06-22T05:14:11.123Z: hello world
  rascal:Vhost Borrowed confirm channel: c2b28680-5088-4d9a-aab9-71be7d0ec322. { vhost: '/', mode: 'confirm', queue: 13, size: 4, available: 0, borrowed: 3, min: 1, max: 5 } +1ms
  rascal:Vhost Releasing confirm channel: e581692f-4779-488a-a0b9-e3998ebda658. { vhost: '/', mode: 'confirm', queue: 12, size: 4, available: 0, borrowed: 3, min: 1, max: 5 } +0ms
2021-06-22T05:14:12.124Z: hello world
  rascal:Vhost Borrowed confirm channel: e581692f-4779-488a-a0b9-e3998ebda658. { vhost: '/', mode: 'confirm', queue: 12, size: 4, available: 0, borrowed: 3, min: 1, max: 5 } +1ms
  rascal:Vhost Releasing confirm channel: 33cc0b58-2775-493d-b61d-8adbcddee85b. { vhost: '/', mode: 'confirm', queue: 11, size: 4, available: 0, borrowed: 3, min: 1, max: 5 } +0ms
  rascal:Vhost Borrowed confirm channel: 33cc0b58-2775-493d-b61d-8adbcddee85b. { vhost: '/', mode: 'confirm', queue: 11, size: 4, available: 0, borrowed: 3, min: 1, max: 5 } +0ms
2021-06-22T05:14:13.127Z: hello world
  rascal:Vhost Releasing confirm channel: c2b28680-5088-4d9a-aab9-71be7d0ec322. { vhost: '/', mode: 'confirm', queue: 10, size: 4, available: 0, borrowed: 3, min: 1, max: 5 } +1ms
2021-06-22T05:14:14.131Z: hello world
  rascal:Vhost Borrowed confirm channel: c2b28680-5088-4d9a-aab9-71be7d0ec322. { vhost: '/', mode: 'confirm', queue: 10, size: 4, available: 0, borrowed: 3, min: 1, max: 5 } +0ms
  rascal:Vhost Created confirm channel: d4a262a4-dd06-4a14-b462-cd208a816c7c from connection: 55736839-b556-4105-90a6-4d3b819ad04d +1ms
  rascal:Vhost Releasing confirm channel: e581692f-4779-488a-a0b9-e3998ebda658. { vhost: '/', mode: 'confirm', queue: 9, size: 4, available: 0, borrowed: 3, min: 1, max: 5 } +0ms
2021-06-22T05:14:15.132Z: hello world
  rascal:Vhost Borrowed confirm channel: e581692f-4779-488a-a0b9-e3998ebda658. { vhost: '/', mode: 'confirm', queue: 9, size: 4, available: 1, borrowed: 3, min: 1, max: 5 } +0ms
  rascal:Vhost Releasing confirm channel: 33cc0b58-2775-493d-b61d-8adbcddee85b. { vhost: '/', mode: 'confirm', queue: 8, size: 4, available: 0, borrowed: 4, min: 1, max: 5 } +0ms
2021-06-22T05:14:16.133Z: hello world
  rascal:Vhost Borrowed confirm channel: d4a262a4-dd06-4a14-b462-cd208a816c7c. { vhost: '/', mode: 'confirm', queue: 8, size: 4, available: 1, borrowed: 3, min: 1, max: 5 } +1ms
  rascal:Vhost Borrowed confirm channel: 33cc0b58-2775-493d-b61d-8adbcddee85b. { vhost: '/', mode: 'confirm', queue: 7, size: 4, available: 0, borrowed: 4, min: 1, max: 5 } +0ms
  rascal:Vhost Creating pooled confirm channel for vhost: / +0ms
  rascal:Vhost Releasing confirm channel: c2b28680-5088-4d9a-aab9-71be7d0ec322. { vhost: '/', mode: 'confirm', queue: 6, size: 5, available: 0, borrowed: 4, min: 1, max: 5 } +1ms
2021-06-22T05:14:17.133Z: hello world
  rascal:Vhost Borrowed confirm channel: c2b28680-5088-4d9a-aab9-71be7d0ec322. { vhost: '/', mode: 'confirm', queue: 6, size: 5, available: 0, borrowed: 4, min: 1, max: 5 } +0ms
  rascal:Vhost Releasing confirm channel: e581692f-4779-488a-a0b9-e3998ebda658. { vhost: '/', mode: 'confirm', queue: 5, size: 5, available: 0, borrowed: 4, min: 1, max: 5 } +0ms
2021-06-22T05:14:18.135Z: hello world
  rascal:Vhost Borrowed confirm channel: e581692f-4779-488a-a0b9-e3998ebda658. { vhost: '/', mode: 'confirm', queue: 5, size: 5, available: 0, borrowed: 4, min: 1, max: 5 } +1ms
  rascal:Vhost Releasing confirm channel: d4a262a4-dd06-4a14-b462-cd208a816c7c. { vhost: '/', mode: 'confirm', queue: 4, size: 5, available: 0, borrowed: 4, min: 1, max: 5 } +0ms
2021-06-22T05:14:19.136Z: hello world
  rascal:Vhost Borrowed confirm channel: d4a262a4-dd06-4a14-b462-cd208a816c7c. { vhost: '/', mode: 'confirm', queue: 4, size: 5, available: 0, borrowed: 4, min: 1, max: 5 } +1ms
  rascal:Vhost Releasing confirm channel: 33cc0b58-2775-493d-b61d-8adbcddee85b. { vhost: '/', mode: 'confirm', queue: 3, size: 5, available: 0, borrowed: 4, min: 1, max: 5 } +0ms
2021-06-22T05:14:20.139Z: hello world
  rascal:Vhost Borrowed confirm channel: 33cc0b58-2775-493d-b61d-8adbcddee85b. { vhost: '/', mode: 'confirm', queue: 3, size: 5, available: 0, borrowed: 4, min: 1, max: 5 } +0ms
  rascal:Vhost Releasing confirm channel: c2b28680-5088-4d9a-aab9-71be7d0ec322. { vhost: '/', mode: 'confirm', queue: 2, size: 5, available: 0, borrowed: 4, min: 1, max: 5 } +1ms
2021-06-22T05:14:21.143Z: hello world
  rascal:Vhost Borrowed confirm channel: c2b28680-5088-4d9a-aab9-71be7d0ec322. { vhost: '/', mode: 'confirm', queue: 2, size: 5, available: 0, borrowed: 4, min: 1, max: 5 } +0ms
  rascal:Vhost Releasing confirm channel: e581692f-4779-488a-a0b9-e3998ebda658. { vhost: '/', mode: 'confirm', queue: 1, size: 5, available: 0, borrowed: 4, min: 1, max: 5 } +0ms
2021-06-22T05:14:22.144Z: hello world
  rascal:Vhost Borrowed confirm channel: e581692f-4779-488a-a0b9-e3998ebda658. { vhost: '/', mode: 'confirm', queue: 1, size: 5, available: 0, borrowed: 4, min: 1, max: 5 } +1ms
  rascal:Vhost Releasing confirm channel: d4a262a4-dd06-4a14-b462-cd208a816c7c. { vhost: '/', mode: 'confirm', queue: 0, size: 5, available: 0, borrowed: 4, min: 1, max: 5 } +0ms
2021-06-22T05:14:23.148Z: hello world
  rascal:Vhost Borrowed confirm channel: d4a262a4-dd06-4a14-b462-cd208a816c7c. { vhost: '/', mode: 'confirm', queue: 0, size: 5, available: 0, borrowed: 4, min: 1, max: 5 } +0ms
  rascal:Vhost Created confirm channel: 21800d97-5677-40d1-a8f9-7b90d5c7cebd from connection: 55736839-b556-4105-90a6-4d3b819ad04d +1ms
  rascal:Vhost Releasing confirm channel: 33cc0b58-2775-493d-b61d-8adbcddee85b. { vhost: '/', mode: 'confirm', queue: 0, size: 5, available: 0, borrowed: 4, min: 1, max: 5 } +0ms
2021-06-22T05:14:24.152Z: hello world
2021-06-22T05:14:25.154Z: hello world
  rascal:Vhost Releasing confirm channel: c2b28680-5088-4d9a-aab9-71be7d0ec322. { vhost: '/', mode: 'confirm', queue: 0, size: 5, available: 2, borrowed: 3, min: 1, max: 5 } +0ms
  rascal:Vhost Releasing confirm channel: e581692f-4779-488a-a0b9-e3998ebda658. { vhost: '/', mode: 'confirm', queue: 0, size: 5, available: 3, borrowed: 2, min: 1, max: 5 } +1ms
2021-06-22T05:14:26.157Z: hello world
  rascal:Vhost Releasing confirm channel: d4a262a4-dd06-4a14-b462-cd208a816c7c. { vhost: '/', mode: 'confirm', queue: 0, size: 5, available: 4, borrowed: 1, min: 1, max: 5 } +1ms
2021-06-22T05:14:27.160Z: hello world
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 0, size: 5, available: 5, borrowed: 0, min: 1, max: 5 } +628ms
  rascal:Vhost Borrowed confirm channel: 33cc0b58-2775-493d-b61d-8adbcddee85b. { vhost: '/', mode: 'confirm', queue: 0, size: 5, available: 4, borrowed: 1, min: 1, max: 5 } +1ms
  rascal:Vhost Releasing confirm channel: 33cc0b58-2775-493d-b61d-8adbcddee85b. { vhost: '/', mode: 'confirm', queue: 0, size: 5, available: 4, borrowed: 1, min: 1, max: 5 } +4ms
2021-06-22T05:14:28.164Z: hello world
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 0, size: 5, available: 5, borrowed: 0, min: 1, max: 5 } +999ms
  rascal:Vhost Borrowed confirm channel: 21800d97-5677-40d1-a8f9-7b90d5c7cebd. { vhost: '/', mode: 'confirm', queue: 0, size: 5, available: 4, borrowed: 1, min: 1, max: 5 } +1ms
  rascal:Vhost Releasing confirm channel: 21800d97-5677-40d1-a8f9-7b90d5c7cebd. { vhost: '/', mode: 'confirm', queue: 0, size: 5, available: 4, borrowed: 1, min: 1, max: 5 } +4ms
2021-06-22T05:14:29.168Z: hello world
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 0, size: 5, available: 5, borrowed: 0, min: 1, max: 5 } +1s
  rascal:Vhost Borrowed confirm channel: c2b28680-5088-4d9a-aab9-71be7d0ec322. { vhost: '/', mode: 'confirm', queue: 0, size: 5, available: 4, borrowed: 1, min: 1, max: 5 } +1ms
  rascal:Vhost Releasing confirm channel: c2b28680-5088-4d9a-aab9-71be7d0ec322. { vhost: '/', mode: 'confirm', queue: 0, size: 5, available: 4, borrowed: 1, min: 1, max: 5 } +4ms
2021-06-22T05:14:30.173Z: hello world
bausmeier commented 3 years ago

Just chipping in here. We see this on one of our applications every week during the maintenance window for our RabbitMQ instance running on Amazon MQ, but rascal recovers as expected and everything continues as normal.

matej-prokop commented 3 years ago

Thank you for replying! So far I didn't manage to reproduce the issue when having DEBUG=rascal:Vhost*. Within my setup, there is Amazon MQ cluster (three RabbitMQ broker nodes behind a Network Load Balancer) so my application has single host defined as connection (pointing at the load balancer).

I tried to both (A) restart brokers within the cluster and (B) upgrade instances with brokers (forcing all brokers instance to be replaced by new instances). Nevertheless, my application always reconnected and continued processing messages from the queue (as expected).

I don't know yet why during the cluster maintenance done by AWS team, it wasn't the case.

I have one more idea to try. My app consumes an input queue, process a message and then post message (processing result) into another output queue. During my test I was producing invalid test messages so the app was discarding messages so it wasn't publishing anything (into output queue). I will try another test with valid messages so the app will also be publishing. Idea is that it could be that result publications "hangs" => causing my app not fetching more messages because previous messages wasn't acked.

BTW: Is it OK to re-use one instance of rascal Broker (created by `Broker.create()) for both subscribing and publishing at one time?

cressie176 commented 3 years ago

Is it OK to re-use one instance of rascal Broker (created by `Broker.create()) for both subscribing and publishing at one time?

Yes, it's fine. I have read that if you really need best performance then it's best to use a separate connections for publishing and subscribe, which you get automatically if the vhosts are different, but not otherwise. I wouldn't worry about it unless you need 1000s of messages a second.

cressie176 commented 3 years ago

Idea is that it could be that result publications "hangs" => causing my app not fetching more messages because previous messages wasn't acked.

Publications are paused when the connection / channel dies and also automatically attempt to reconnect. Any messages that are published while disconnected are queued in memory, so not likely to be this.

matej-prokop commented 3 years ago

So when testing with publications to RMQ, I see that publications stucks when I restart brokers in Amazon MQ cluster. So I put together simple script that allows me to reproduce the issue (share just its core part):

  for (let i = 0; i < 10000; i++) {
    console.log(`Publikuju ${i}`);
    await rabbitPublisher
      .publish({ payload, overrides })
      .catch(logger.error(`Failed to publish notification to RMQ`));
  }

Where rabbitPublisher is my custom wrapper (sharing just definition of publish function from that module):

publish: ({ payload, overrides = {} }) => {
  return new Promise((resolve, reject) => {
    rabbitBroker
      .getRascalBroker()  // returns "singleton" instance created by Broker.create()  
      .then(broker => {
        broker.publish(
          publicationName,
          payload,
          overridesWithDefaults,
          (err, publication) => {
            if (err) {
              reject(err);
            } else {
              publication.on("success", resolve).on("error", reject);
            }
          }
        );
      })
      .catch(reject);
  });
}

Here is output of when running my test script:

mprokop$ DEBUG=rascal:Vhost* node adhoc/rmq-publisher.js
  rascal:Vhost Initialising vhost: / +0ms
  rascal:Vhost Creating regular channel pool { autostart: false, max: 5, min: 1, evictionRunIntervalMillis: 10000, idleTimeoutMillis: 60000, testOnBorrow: true, rejectionDelayMillis: 1000 } +2s
  rascal:Vhost Creating confirm channel pool { autostart: false, max: 5, min: 1, evictionRunIntervalMillis: 10000, idleTimeoutMillis: 60000, testOnBorrow: true, rejectionDelayMillis: 1000 } +3ms
  rascal:Vhost vhost: / was initialised with connection: 348a7a21-b57e-483b-9994-4ea3ba4b319c +0ms
info: Started RabbitBroker (rascal).
Publikuju 0
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 0, size: 0, available: 0, borrowed: 0, min: 1, max: 5 } +5ms
  rascal:Vhost Creating pooled confirm channel for vhost: / +1ms
  rascal:Vhost Created confirm channel: f2c8a52e-4d51-4d0d-90c5-6d20a3ac68e6 from connection: 348a7a21-b57e-483b-9994-4ea3ba4b319c +476ms
  rascal:Vhost Borrowed confirm channel: f2c8a52e-4d51-4d0d-90c5-6d20a3ac68e6. { vhost: '/', mode: 'confirm', queue: 0, size: 1, available: 0, borrowed: 1, min: 1, max: 5 } +1ms
  rascal:Vhost Releasing confirm channel: f2c8a52e-4d51-4d0d-90c5-6d20a3ac68e6. { vhost: '/', mode: 'confirm', queue: 0, size: 1, available: 0, borrowed: 1, min: 1, max: 5 } +244ms
Publikuju 1
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 0, size: 1, available: 1, borrowed: 0, min: 1, max: 5 } +1ms
  rascal:Vhost Borrowed confirm channel: f2c8a52e-4d51-4d0d-90c5-6d20a3ac68e6. { vhost: '/', mode: 'confirm', queue: 0, size: 1, available: 0, borrowed: 1, min: 1, max: 5 } +0ms
  rascal:Vhost Releasing confirm channel: f2c8a52e-4d51-4d0d-90c5-6d20a3ac68e6. { vhost: '/', mode: 'confirm', queue: 0, size: 1, available: 0, borrowed: 1, min: 1, max: 5 } +237ms
Publikuju 2

......

Publikuju 717
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 0, size: 1, available: 1, borrowed: 0, min: 1, max: 5 } +1ms
  rascal:Vhost Borrowed confirm channel: f2c8a52e-4d51-4d0d-90c5-6d20a3ac68e6. { vhost: '/', mode: 'confirm', queue: 0, size: 1, available: 0, borrowed: 1, min: 1, max: 5 } +0ms
  rascal:Vhost Releasing confirm channel: f2c8a52e-4d51-4d0d-90c5-6d20a3ac68e6. { vhost: '/', mode: 'confirm', queue: 0, size: 1, available: 0, borrowed: 1, min: 1, max: 5 } +244ms
Publikuju 718
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 0, size: 1, available: 1, borrowed: 0, min: 1, max: 5 } +0ms
  rascal:Vhost Borrowed confirm channel: f2c8a52e-4d51-4d0d-90c5-6d20a3ac68e6. { vhost: '/', mode: 'confirm', queue: 0, size: 1, available: 0, borrowed: 1, min: 1, max: 5 } +1ms
  rascal:Vhost Destroying confirm channel: f2c8a52e-4d51-4d0d-90c5-6d20a3ac68e6. { vhost: '/', mode: 'confirm', queue: 0, size: 1, available: 0, borrowed: 1, min: 1, max: 5 } +152ms
  rascal:Vhost Destroying channel: f2c8a52e-4d51-4d0d-90c5-6d20a3ac68e6 +1ms
  rascal:Vhost Creating pooled confirm channel for vhost: / +1ms
  rascal:Vhost Handling connection error: Connection closed: 320 (CONNECTION-FORCED) with message "CONNECTION_FORCED - broker forced connection closure with reason 'shutdown'" initially from connection: 348a7a21-b57e-483b-9994-4ea3ba4b319c, amqps://myuser:***@b-b38e98a9-c9b7-55df-ba64-e5402bf53cc9.mq.us-east-1.amazonaws.com:12346?heartbeat=10&connection_timeout=10000&channelMax=100 +2ms
error: RabbitBroker (rascal) - error message=Connection closed: 320 (CONNECTION-FORCED) with message "CONNECTION_FORCED - broker forced connection closure with reason 'shutdown'", stack=Error: Connection closed: 320 (CONNECTION-FORCED) with message "CONNECTION_FORCED - broker forced connection closure with reason 'shutdown'"
    at Object.accept (/myfolder/node_modules/amqplib/lib/connection.js:91:15)
    at Connection.mainAccept (/myfolder/node_modules/amqplib/lib/connection.js:64:33)
    at TLSSocket.go (/myfolder/node_modules/amqplib/lib/connection.js:478:48)
    at TLSSocket.emit (events.js:315:20)
    at emitReadable_ (internal/streams/readable.js:569:12)
    at processTicksAndRejections (internal/process/task_queues.js:79:21), code=320
  rascal:Vhost Initialising vhost: / +0ms
error: Failed to publish notification to RMQ message=channel closed, stack=Error: channel closed
    at ConfirmChannel.<anonymous> (/myfolder/node_modules/amqplib/lib/channel.js:39:18)
    at ConfirmChannel.emit (events.js:327:22)
    at ConfirmChannel.C.toClosed (/myfolder/node_modules/amqplib/lib/channel.js:175:8)
    at Connection.C._closeChannels (/myfolder/node_modules/amqplib/lib/connection.js:394:18)
    at Connection.C.toClosed (/myfolder/node_modules/amqplib/lib/connection.js:401:8)
    at Object.accept (/myfolder/node_modules/amqplib/lib/connection.js:96:18)
    at Connection.mainAccept (/myfolder/node_modules/amqplib/lib/connection.js:64:33)
    at TLSSocket.go (/myfolder/node_modules/amqplib/lib/connection.js:478:48)
    at TLSSocket.emit (events.js:315:20)
    at emitReadable_ (internal/streams/readable.js:569:12)
    at processTicksAndRejections (internal/process/task_queues.js:79:21)
Publikuju 719
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 0, size: 1, available: 0, borrowed: 0, min: 1, max: 5 } +3ms
 rascal:Vhost vhost: / was initialised with connection: 56b6dce8-4bff-4172-b673-c9a1020a22ec +2s
 rascal:Vhost Handling connection error: Heartbeat timeout initially from connection: 56b6dce8-4bff-4172-b673-c9a1020a22ec, amqps://myuser:***@b-b38e98a9-c9b7-55df-ba64-e5402bf53cc9.mq.us-east-1.amazonaws.com:12346?heartbeat=10&connection_timeout=10000&channelMax=100 +29m
error: RabbitBroker (rascal) - error message=Heartbeat timeout, stack=Error: Heartbeat timeout
    at Heart.<anonymous> (/myfolder/node_modules/amqplib/lib/connection.js:427:19)
    at Heart.emit (events.js:315:20)
    at Heart.runHeartbeat (/myfolder/node_modules/amqplib/lib/heartbeat.js:88:17)
    at listOnTimeout (internal/timers.js:554:17)
    at processTimers (internal/timers.js:497:7)
  rascal:Vhost Initialising vhost: / +1ms
  rascal:Vhost Will attempt reconnection in in 585ms +15s
  rascal:Vhost Handling connection error: Failed to connect to: amqps://myuser:***@b-b38e98a9-c9b7-55df-ba64-e5402bf53cc9.mq.us-east-1.amazonaws.com:12346?heartbeat=10&connection_timeout=10000&channelMax=100. Original message was: connect ETIMEDOUT initially from connection: 56b6dce8-4bff-4172-b673-c9a1020a22ec, amqps://myuser:***@b-b38e98a9-c9b7-55df-ba64-e5402bf53cc9.mq.us-east-1.amazonaws.com:12346?heartbeat=10&connection_timeout=10000&channelMax=100 +595ms
error: RabbitBroker (rascal) - error message=Failed to connect to: amqps://myuser:***@b-b38e98a9-c9b7-55df-ba64-e5402bf53cc9.mq.us-east-1.amazonaws.com:12346?heartbeat=10&connection_timeout=10000&channelMax=100. Original message was: connect ETIMEDOUT, stack=Error: Failed to connect to: amqps://myuser:***@b-b38e98a9-c9b7-55df-ba64-e5402bf53cc9.mq.us-east-1.amazonaws.com:12346?heartbeat=10&connection_timeout=10000&channelMax=100. Original message was: connect ETIMEDOUT
    at TLSSocket.<anonymous> (/myfolder/node_modules/amqplib/lib/connect.js:172:20)
    at Object.onceWrapper (events.js:421:28)
    at TLSSocket.emit (events.js:315:20)
    at TLSSocket.Socket._onTimeout (net.js:483:8)
    at listOnTimeout (internal/timers.js:554:17)
    at processTimers (internal/timers.js:497:7)
  rascal:Vhost Initialising vhost: / +3ms
  rascal:Vhost vhost: / was initialised with connection: 1ddbe475-40de-42c6-9444-c842de43edc2 +5s
^C
mprokop$ DEBUG=rascal:Vhost* node adhoc/rmq-publisher.js
  rascal:Vhost Initialising vhost: / +0ms
  rascal:Vhost Creating regular channel pool { autostart: false, max: 5, min: 1, evictionRunIntervalMillis: 10000, idleTimeoutMillis: 60000, testOnBorrow: true, rejectionDelayMillis: 1000 } +3s
  rascal:Vhost Creating confirm channel pool { autostart: false, max: 5, min: 1, evictionRunIntervalMillis: 10000, idleTimeoutMillis: 60000, testOnBorrow: true, rejectionDelayMillis: 1000 } +3ms
  rascal:Vhost vhost: / was initialised with connection: 5dc9f4ab-4267-4028-aa22-98efa158a0cb +1ms
info: Started RabbitBroker (rascal).
Publikuju 0
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 0, size: 0, available: 0, borrowed: 0, min: 1, max: 5 } +5ms
  rascal:Vhost Creating pooled confirm channel for vhost: / +2ms
  rascal:Vhost Created confirm channel: 82f7f05a-b041-4def-9ecc-afd25a3c2073 from connection: 5dc9f4ab-4267-4028-aa22-98efa158a0cb +484ms
  rascal:Vhost Borrowed confirm channel: 82f7f05a-b041-4def-9ecc-afd25a3c2073. { vhost: '/', mode: 'confirm', queue: 0, size: 1, available: 0, borrowed: 1, min: 1, max: 5 } +1ms
  rascal:Vhost Releasing confirm channel: 82f7f05a-b041-4def-9ecc-afd25a3c2073. { vhost: '/', mode: 'confirm', queue: 0, size: 1, available: 0, borrowed: 1, min: 1, max: 5 } +250ms
Publikuju 1
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 0, size: 1, available: 1, borrowed: 0, min: 1, max: 5 } +0ms
  rascal:Vhost Borrowed confirm channel: 82f7f05a-b041-4def-9ecc-afd25a3c2073. { vhost: '/', mode: 'confirm', queue: 0, size: 1, available: 0, borrowed: 1, min: 1, max: 5 } +1ms
  rascal:Vhost Releasing confirm channel: 82f7f05a-b041-4def-9ecc-afd25a3c2073. { vhost: '/', mode: 'confirm', queue: 0, size: 1, available: 0, borrowed: 1, min: 1, max: 5 } +238ms
Publikuju 2
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 0, size: 1, available: 1, borrowed: 0, min: 1, max: 5 } +1ms
  rascal:Vhost Borrowed confirm channel: 82f7f05a-b041-4def-9ecc-afd25a3c2073. { vhost: '/', mode: 'confirm', queue: 0, size: 1, available: 0, borrowed: 1, min: 1, max: 5 } +0ms
  rascal:Vhost Releasing confirm channel: 82f7f05a-b041-4def-9ecc-afd25a3c2073. { vhost: '/', mode: 'confirm', queue: 0, size: 1, available: 0, borrowed: 1, min: 1, max: 5 } +240ms

Comments:

I started to suspect that there could be actually something wrong with my rabbitPublisher wrapper (maybe I am not handling some events emitted by rascal causing promise from rabbitPublisher.publish to never resolve/reject. Anyway, I wanted to share with you my findings so far - maybe you will see what is wrong immediately.

Side note: I just found I don't set any publication timeouts when creating broker (via Broker.create()) because I have copy-pasted defaults from old version of rascal

matej-prokop commented 3 years ago

Tried to set publication timeout but it didn't help. Nevertheless, I added some debug logging into my rabbitPublisher wrapper:

...

broker.publish(
  publicationName,
  payload,
  overridesWithDefaults,
  (err, publication) => {
    if (err) {
      reject(err);
    } else {
      console.log("Got publication object");
      publication
        .on("success", () => {
          console.log("Message was success");
          resolve();
        })
        .on("error", err => {
          console.log("Message was error");
          reject(err);
        })
        .on("return", () => {
          console.log("Message was returned");
          resolve();
        });
    }
  }

...

and found that very last thing that my script logs is Got publication object but publication doesn't fire any event. Meaning non of success, error or return are called. Causing "parent" promise to never resolve

matej-prokop commented 3 years ago

I did one more test - updating my test script to have its own timeout:

  for (let i = 0; i < 10000; i++) {
    ...
    await Promise.race([
      rabbitPublisher
        .publish({ payload, overrides })
        .catch(
          errorHandler.handleError(`Failed to publish notification to RMQ`)
        ),
      new Promise(res =>
        setTimeout(() => {
          res();
        }, 30000)
      )
    ]);
  }

causing that even when publication of a message stucks (no event is emitted), it will try to publish another message after 30 seconds. Interestingly, it seems that "publication" gets into some weird state (after Amazon MQ cluster restart) because any future attempt to publish a message stucks (symptoms are same Got publication object is logged but no event is emitted). When I restart my script it immediately starts working correctly again...

cressie176 commented 3 years ago

Hi @matej-prokop,

Thanks for sharing the debug. It does reveal something interesting...

rascal:Vhost vhost: / was initialised with connection: 56b6dce8-4bff-4172-b673-c9a1020a22ec 
+2s
rascal:Vhost Handling connection error: Heartbeat timeout initially from connection: 56b6dce8-4bff-4172-b673-c9a1020a22ec, amqps://myuser:***@b-b38e98a9-c9b7-55df-ba64-e5402bf53cc9.mq.us-east-1.amazonaws.com:12346?heartbeat=10&connection_timeout=10000&channelMax=100
+29m

Rascal must have successfully reconnected after the original connection error in order to log this message.

The next message is 29 minutes later, however if you followed my instructions this could just be because we only enabled the Vhost logging. It's probably best if we enable all debug as I suspect there was more going on between these two messages which would shed further light. Can you set the DEBUG environment variable to just rascal:* so we see everything please?

cressie176 commented 3 years ago

Depending on your version of Rascal, one issue could be that the channel pool used by the publisher attempts to create a channel and fails. There is a bug in the underlying channel pool used by rascal, which caused it to spin into a tight loop. This was fixed in 10.2.3, so I recommend upgrading to the latest version if this is the case.

matej-prokop commented 3 years ago

All my tests were done with rascal 13.0.2 and amqplib 0.7.1 so it must be something else. I will retry with DEBUG=rascal:* and share output here.

matej-prokop commented 3 years ago

Output with DEBUG=rascal:* (while having custom timeout in place https://github.com/guidesmiths/rascal/issues/156#issuecomment-868562983 - trying to publishing another message after 30s even if previous publishing hasn't finished yet):

...

info: Publishing message #457
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 0, size: 1, available: 1, borrowed: 0, min: 1, max: 5 } +1ms
Got publication
  rascal:Vhost Borrowed confirm channel: 3c573fdd-7b3f-4522-8e9f-e77c0be0cad9. { vhost: '/', mode: 'confirm', queue: 0, size: 1, available: 0, borrowed: 1, min: 1, max: 5 } +0ms
  rascal:Publication Publishing 140 bytes to confirm exchange: my-pipeline-exchange with routingKeys: user.1. +251ms
  rascal:Vhost Releasing confirm channel: 3c573fdd-7b3f-4522-8e9f-e77c0be0cad9. { vhost: '/', mode: 'confirm', queue: 0, size: 1, available: 0, borrowed: 1, min: 1, max: 5 } +353ms
info: Publishing message #458
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 0, size: 1, available: 1, borrowed: 0, min: 1, max: 5 } +1ms
Got publication
  rascal:Vhost Borrowed confirm channel: 3c573fdd-7b3f-4522-8e9f-e77c0be0cad9. { vhost: '/', mode: 'confirm', queue: 0, size: 1, available: 0, borrowed: 1, min: 1, max: 5 } +0ms
  rascal:Publication Publishing 140 bytes to confirm exchange: my-pipeline-exchange with routingKeys: user.1. +354ms
  rascal:Vhost Destroying confirm channel: 3c573fdd-7b3f-4522-8e9f-e77c0be0cad9. { vhost: '/', mode: 'confirm', queue: 0, size: 1, available: 0, borrowed: 1, min: 1, max: 5 } +306ms
  rascal:Vhost Destroying channel: 3c573fdd-7b3f-4522-8e9f-e77c0be0cad9 +0ms
  rascal:Vhost Creating pooled confirm channel for vhost: / +1ms
  rascal:Vhost Handling connection error: Connection closed: 320 (CONNECTION-FORCED) with message "CONNECTION_FORCED - broker forced connection closure with reason 'shutdown'" initially from connection: 53870274-af97-4698-b27d-84aae18fd6e7, amqps://myuser:***@b-b34f98b9-c9b8-45df-bb63-e5402bf50cc9.mq.us-east-1.amazonaws.com:12346?heartbeat=10&connection_timeout=10000&channelMax=100 +0ms
error: RabbitBroker (rascal) - error message=Connection closed: 320 (CONNECTION-FORCED) with message "CONNECTION_FORCED - broker forced connection closure with reason 'shutdown'", stack=Error: Connection closed: 320 (CONNECTION-FORCED) with message "CONNECTION_FORCED - broker forced connection closure with reason 'shutdown'"
    at Object.accept (/myproject/node_modules/amqplib/lib/connection.js:91:15)
    at Connection.mainAccept (/myproject/node_modules/amqplib/lib/connection.js:64:33)
    at TLSSocket.go (/myproject/node_modules/amqplib/lib/connection.js:478:48)
    at TLSSocket.emit (events.js:315:20)
    at emitReadable_ (internal/streams/readable.js:569:12)
    at processTicksAndRejections (internal/process/task_queues.js:79:21), code=320
  rascal:Vhost Initialising vhost: / +1ms
  rascal:tasks:createConnection Connecting to broker using url: amqps://myuser:***@b-b34f98b9-c9b8-45df-bb63-e5402bf50cc9.mq.us-east-1.amazonaws.com:12346?heartbeat=10&connection_timeout=10000&channelMax=100 +2m
error: Failed to publish notification to RMQ message=channel closed, stack=Error: channel closed
    at ConfirmChannel.<anonymous> (/myproject/node_modules/amqplib/lib/channel.js:39:18)
    at ConfirmChannel.emit (events.js:327:22)
    at ConfirmChannel.C.toClosed (/myproject/node_modules/amqplib/lib/channel.js:175:8)
    at Connection.C._closeChannels (/myproject/node_modules/amqplib/lib/connection.js:394:18)
    at Connection.C.toClosed (/myproject/node_modules/amqplib/lib/connection.js:401:8)
    at Object.accept (/myproject/node_modules/amqplib/lib/connection.js:96:18)
    at Connection.mainAccept (/myproject/node_modules/amqplib/lib/connection.js:64:33)
    at TLSSocket.go (/myproject/node_modules/amqplib/lib/connection.js:478:48)
    at TLSSocket.emit (events.js:315:20)
    at emitReadable_ (internal/streams/readable.js:569:12)
    at processTicksAndRejections (internal/process/task_queues.js:79:21)
info: Publishing message #459
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 0, size: 1, available: 0, borrowed: 0, min: 1, max: 5 } +2ms
Got publication
  rascal:Vhost Will attempt reconnection in in 852ms +270ms
  rascal:Vhost Handling connection error: Failed to connect to: amqps://myuser:***@b-b34f98b9-c9b8-45df-bb63-e5402bf50cc9.mq.us-east-1.amazonaws.com:12346?heartbeat=10&connection_timeout=10000&channelMax=100. Original message was: read ECONNRESET initially from connection: 53870274-af97-4698-b27d-84aae18fd6e7, amqps://myuser:***@b-b34f98b9-c9b8-45df-bb63-e5402bf50cc9.mq.us-east-1.amazonaws.com:12346?heartbeat=10&connection_timeout=10000&channelMax=100 +853ms
error: RabbitBroker (rascal) - error message=Failed to connect to: amqps://myuser:***@b-b34f98b9-c9b8-45df-bb63-e5402bf50cc9.mq.us-east-1.amazonaws.com:12346?heartbeat=10&connection_timeout=10000&channelMax=100. Original message was: read ECONNRESET, stack=Error: read ECONNRESET
    at TLSWrap.onStreamRead (internal/stream_base_commons.js:209:20), errno=-54, code=ECONNRESET, syscall=read
  rascal:Vhost Initialising vhost: / +0ms
  rascal:tasks:createConnection Connecting to broker using url: amqps://myuser:***@b-b34f98b9-c9b8-45df-bb63-e5402bf50cc9.mq.us-east-1.amazonaws.com:12346?heartbeat=10&connection_timeout=10000&channelMax=100 +1s
  rascal:tasks:createConnection Obtained connection: 71e04b7d-cd0a-41f4-844c-ce476de89a99 +2s
  rascal:tasks:createChannel Creating channel +2m
  rascal:tasks:assertExchanges Asserting exchange: my-pipeline-exchange +2m
  rascal:tasks:closeChannel Closing channel +2m
  rascal:Vhost vhost: / was initialised with connection: 71e04b7d-cd0a-41f4-844c-ce476de89a99 +2s
info: Publishing message #460
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 0, size: 1, available: 0, borrowed: 0, min: 1, max: 5 } +27s
Got publication
  rascal:Vhost Handling connection error: Connection closed: 320 (CONNECTION-FORCED) with message "CONNECTION_FORCED - broker forced connection closure with reason 'shutdown'" initially from connection: 71e04b7d-cd0a-41f4-844c-ce476de89a99, amqps://myuser:***@b-b34f98b9-c9b8-45df-bb63-e5402bf50cc9.mq.us-east-1.amazonaws.com:12346?heartbeat=10&connection_timeout=10000&channelMax=100 +22s
error: RabbitBroker (rascal) - error message=Connection closed: 320 (CONNECTION-FORCED) with message "CONNECTION_FORCED - broker forced connection closure with reason 'shutdown'", stack=Error: Connection closed: 320 (CONNECTION-FORCED) with message "CONNECTION_FORCED - broker forced connection closure with reason 'shutdown'"
    at Object.accept (/myproject/node_modules/amqplib/lib/connection.js:91:15)
    at Connection.mainAccept (/myproject/node_modules/amqplib/lib/connection.js:64:33)
    at TLSSocket.go (/myproject/node_modules/amqplib/lib/connection.js:478:48)
    at TLSSocket.emit (events.js:315:20)
    at emitReadable_ (internal/streams/readable.js:569:12)
    at processTicksAndRejections (internal/process/task_queues.js:79:21), code=320
  rascal:Vhost Initialising vhost: / +0ms
  rascal:tasks:createConnection Connecting to broker using url: amqps://myuser:***@b-b34f98b9-c9b8-45df-bb63-e5402bf50cc9.mq.us-east-1.amazonaws.com:12346?heartbeat=10&connection_timeout=10000&channelMax=100 +49s
  rascal:tasks:createConnection Obtained connection: 8678e2ee-aeb2-4165-9190-dfdda647960b +1s
  rascal:tasks:createChannel Creating channel +50s
  rascal:tasks:assertExchanges Asserting exchange: my-pipeline-exchange +50s
  rascal:tasks:closeChannel Closing channel +50s
  rascal:Vhost vhost: / was initialised with connection: 8678e2ee-aeb2-4165-9190-dfdda647960b +2s
info: Publishing message #461
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 1, size: 1, available: 0, borrowed: 0, min: 1, max: 5 } +6s
Got publication
info: Publishing message #462
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 2, size: 1, available: 0, borrowed: 0, min: 1, max: 5 } +30s
Got publication
info: Publishing message #463
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 3, size: 1, available: 0, borrowed: 0, min: 1, max: 5 } +30s
Got publication
^C
cressie176 commented 3 years ago

Hi @matej-prokop,

Thanks for the extra debug. It shows that the connection was successfully re-established, but that attempts to borrow channels from the pool never yield a result. You can see the queue growing (1, 2, 3) but never released - available is 0 and borrowed is zero.

It's possible that there's a bug somewhere and channel allocation, which is paused while the connection is unavailable, is somehow never resumed when it recovers. It could also be that the channel pool has somehow choked. I'll see if I can reproduce.

cressie176 commented 3 years ago

I'm able to reproduce. Looks like the queue is still paused. Will fix ASAP. Thanks for your help uncovering this.

cressie176 commented 3 years ago

Seems intermittent. I haven't been able to reproduce again since adding more debug.

matej-prokop commented 3 years ago

Interesting. On my end, it seems to be happening constantly. If you won't manage to reproduce again with additional debug logging. Let me know. I can add debug logging on my end or you can publish a beta version of rascal with additional debug logging. Then I can use it to provide you more context.

cressie176 commented 3 years ago

Thanks @matej-prokop.

It would be helpful to add paused to the channel pool stats function

    function stats() {
      return {
        vhost: self.name,
        mode,
        paused: poolQueue.paused,
        queue: poolQueue.length(),
        size: pool.size,
        available: pool.available,
        borrowed: pool.borrowed,
        min: pool.min,
        max: pool.max,
      };
    }

and also some extra logging here so we can see the channel allocation should be resumed. e.g.

  function resumeChannelAllocation() {
    debug('Resuming channel allocation on connection %s', connection._rascal_id);
    channelCreator.resume();
    regularChannelPool && regularChannelPool.resume();
    confirmChannelPool && confirmChannelPool.resume();
    debug('Confirm Channel Stats: %o', confirmChannelPool.stats());
    paused = false;
    self.emit('resumed', { vhost: self.name });
  }
matej-prokop commented 3 years ago

@cressie176 thank you for providing clear instructions!

Here is its output:

...

info: Publishing message #1061
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', paused: false, queue: 0, size: 1, available: 1, borrowed: 0, min: 1, max: 5 } +1ms
Got publication
  rascal:Vhost Borrowed confirm channel: c7222f81-12a5-4c7b-ac21-c241b4124f4e. { vhost: '/', mode: 'confirm', paused: false, queue: 0, size: 1, available: 0, borrowed: 1, min: 1, max: 5 } +1ms
  rascal:Publication Publishing 141 bytes to confirm exchange: my-pipeline-exchange with routingKeys: user.1. +306ms
  rascal:Vhost Destroying confirm channel: c7222f81-12a5-4c7b-ac21-c241b4124f4e. { vhost: '/', mode: 'confirm', paused: false, queue: 0, size: 1, available: 0, borrowed: 1, min: 1, max: 5 } +69ms
  rascal:Vhost Destroying channel: c7222f81-12a5-4c7b-ac21-c241b4124f4e +0ms
  rascal:Vhost Creating pooled confirm channel for vhost: / +1ms
  rascal:Vhost Handling connection error: Connection closed: 320 (CONNECTION-FORCED) with message "CONNECTION_FORCED - broker forced connection closure with reason 'shutdown'" initially from connection: 15e9a994-9457-448e-8120-d8a9558cc5a3, amqps://myuser:***@b-b34f98b9-c9b8-45df-bb63-e5402bf50cc9.mq.us-east-1.amazonaws.com:12346?heartbeat=10&connection_timeout=10000&channelMax=100 +1ms
error: RabbitBroker (rascal) - error message=Connection closed: 320 (CONNECTION-FORCED) with message "CONNECTION_FORCED - broker forced connection closure with reason 'shutdown'", stack=Error: Connection closed: 320 (CONNECTION-FORCED) with message "CONNECTION_FORCED - broker forced connection closure with reason 'shutdown'"
    at Object.accept (/myproject/node_modules/amqplib/lib/connection.js:91:15)
    at Connection.mainAccept (/myproject/node_modules/amqplib/lib/connection.js:64:33)
    at TLSSocket.go (/myproject/node_modules/amqplib/lib/connection.js:478:48)
    at TLSSocket.emit (events.js:315:20)
    at emitReadable_ (internal/streams/readable.js:569:12)
    at processTicksAndRejections (internal/process/task_queues.js:79:21), code=320
  rascal:Vhost Initialising vhost: / +1ms
  rascal:tasks:createConnection Connecting to broker using url: amqps://myuser:***@b-b34f98b9-c9b8-45df-bb63-e5402bf50cc9.mq.us-east-1.amazonaws.com:12346?heartbeat=10&connection_timeout=10000&channelMax=100 +5m
error: Failed to publish notification to RMQ message=channel closed, stack=Error: channel closed
    at ConfirmChannel.<anonymous> (/myproject/node_modules/amqplib/lib/channel.js:39:18)
    at ConfirmChannel.emit (events.js:327:22)
    at ConfirmChannel.C.toClosed (/myproject/node_modules/amqplib/lib/channel.js:175:8)
    at Connection.C._closeChannels (/myproject/node_modules/amqplib/lib/connection.js:394:18)
    at Connection.C.toClosed (/myproject/node_modules/amqplib/lib/connection.js:401:8)
    at Object.accept (/myproject/node_modules/amqplib/lib/connection.js:96:18)
    at Connection.mainAccept (/myproject/node_modules/amqplib/lib/connection.js:64:33)
    at TLSSocket.go (/myproject/node_modules/amqplib/lib/connection.js:478:48)
    at TLSSocket.emit (events.js:315:20)
    at emitReadable_ (internal/streams/readable.js:569:12)
    at processTicksAndRejections (internal/process/task_queues.js:79:21)
info: Publishing message #1062
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', paused: true, queue: 0, size: 1, available: 0, borrowed: 0, min: 1, max: 5 } +3ms
Got publication
  rascal:Vhost Will attempt reconnection in in 791ms +530ms
  rascal:Vhost Handling connection error: Failed to connect to: amqps://myuser:***@b-b34f98b9-c9b8-45df-bb63-e5402bf50cc9.mq.us-east-1.amazonaws.com:12346?heartbeat=10&connection_timeout=10000&channelMax=100. Original message was: read ECONNRESET initially from connection: 15e9a994-9457-448e-8120-d8a9558cc5a3, amqps://myuser:***@b-b34f98b9-c9b8-45df-bb63-e5402bf50cc9.mq.us-east-1.amazonaws.com:12346?heartbeat=10&connection_timeout=10000&channelMax=100 +794ms
error: RabbitBroker (rascal) - error message=Failed to connect to: amqps://myuser:***@b-b34f98b9-c9b8-45df-bb63-e5402bf50cc9.mq.us-east-1.amazonaws.com:12346?heartbeat=10&connection_timeout=10000&channelMax=100. Original message was: read ECONNRESET, stack=Error: read ECONNRESET
    at TLSWrap.onStreamRead (internal/stream_base_commons.js:209:20), errno=-54, code=ECONNRESET, syscall=read
  rascal:Vhost Initialising vhost: / +1ms
  rascal:tasks:createConnection Connecting to broker using url: amqps://myuser:***@b-b34f98b9-c9b8-45df-bb63-e5402bf50cc9.mq.us-east-1.amazonaws.com:12346?heartbeat=10&connection_timeout=10000&channelMax=100 +1s
  rascal:tasks:createConnection Obtained connection: ae199624-7b8f-4c4b-8486-cf9ad886b69e +1s
  rascal:tasks:createChannel Creating channel +5m
  rascal:tasks:assertExchanges Asserting exchange: my-pipeline-exchange +5m
  rascal:tasks:closeChannel Closing channel +5m
  rascal:Vhost Resuming channel allocation on connection ae199624-7b8f-4c4b-8486-cf9ad886b69e +2s
  rascal:Vhost Confirm Channel Stats: { vhost: '/', mode: 'confirm', paused: false, queue: 1, size: 1, available: 0, borrowed: 0, min: 1, max: 5 } +1ms
  rascal:Vhost vhost: / was initialised with connection: ae199624-7b8f-4c4b-8486-cf9ad886b69e +0ms
info: Publishing message #1063
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', paused: false, queue: 0, size: 1, available: 0, borrowed: 0, min: 1, max: 5 } +27s
Got publication
info: Publishing message #1064
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', paused: false, queue: 1, size: 1, available: 0, borrowed: 0, min: 1, max: 5 } +30s
Got publication
info: Publishing message #1065
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', paused: false, queue: 2, size: 1, available: 0, borrowed: 0, min: 1, max: 5 } +30s
Got publication
  rascal:Vhost Handling connection error: Unexpected close initially from connection: ae199624-7b8f-4c4b-8486-cf9ad886b69e, amqps://myuser:***@b-b34f98b9-c9b8-45df-bb63-e5402bf50cc9.mq.us-east-1.amazonaws.com:12346?heartbeat=10&connection_timeout=10000&channelMax=100 +13s
error: RabbitBroker (rascal) - error message=Unexpected close, stack=Error: Unexpected close
    at succeed (/myproject/node_modules/amqplib/lib/connection.js:272:13)
    at onOpenOk (/myproject/node_modules/amqplib/lib/connection.js:254:5)
    at /myproject/node_modules/amqplib/lib/connection.js:166:32
    at /myproject/node_modules/amqplib/lib/connection.js:160:12
    at TLSSocket.recv (/myproject/node_modules/amqplib/lib/connection.js:499:12)
    at Object.onceWrapper (events.js:421:28)
    at TLSSocket.emit (events.js:315:20)
    at emitReadable_ (internal/streams/readable.js:569:12)
    at processTicksAndRejections (internal/process/task_queues.js:79:21)
  rascal:Vhost Initialising vhost: / +1ms
  rascal:tasks:createConnection Connecting to broker using url: amqps://myuser:***@b-b34f98b9-c9b8-45df-bb63-e5402bf50cc9.mq.us-east-1.amazonaws.com:12346?heartbeat=10&connection_timeout=10000&channelMax=100 +2m
  rascal:tasks:createConnection Obtained connection: c5fdf669-c62b-4e9f-8873-780ff96247c5 +2s
  rascal:tasks:createChannel Creating channel +2m
  rascal:tasks:assertExchanges Asserting exchange: my-pipeline-exchange +2m
  rascal:tasks:closeChannel Closing channel +2m
  rascal:Vhost Resuming channel allocation on connection c5fdf669-c62b-4e9f-8873-780ff96247c5 +2s
  rascal:Vhost Confirm Channel Stats: { vhost: '/', mode: 'confirm', paused: false, queue: 3, size: 1, available: 0, borrowed: 0, min: 1, max: 5 } +0ms
  rascal:Vhost vhost: / was initialised with connection: c5fdf669-c62b-4e9f-8873-780ff96247c5 +0ms
info: Publishing message #1066
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', paused: false, queue: 3, size: 1, available: 0, borrowed: 0, min: 1, max: 5 } +15s
Got publication
info: Publishing message #1067
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', paused: false, queue: 4, size: 1, available: 0, borrowed: 0, min: 1, max: 5 } +30s
Got publication
info: Publishing message #1068
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', paused: false, queue: 5, size: 1, available: 0, borrowed: 0, min: 1, max: 5 } +30s
Got publication
info: Publishing message #1069
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', paused: false, queue: 6, size: 1, available: 0, borrowed: 0, min: 1, max: 5 } +30s
Got publication
info: Publishing message #1070
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', paused: false, queue: 7, size: 1, available: 0, borrowed: 0, min: 1, max: 5 } +30s
Got publication
info: Publishing message #1071
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', paused: false, queue: 8, size: 1, available: 0, borrowed: 0, min: 1, max: 5 } +30s
Got publication
^C
matej-prokop commented 3 years ago

But I can confirm that the issue doesn't happen in 100% cases. On my first attempt after adding additional debug logging, it reconnected and continued with publishing messages as expected. (If you will want I have output for that run saved and can share it here if that would help).

cressie176 commented 3 years ago

So that's very interesting. It says the queue isn't paused. Are you able to add some further debug where the queue task executes and [attempts to acquire a channel[(https://github.com/guidesmiths/rascal/blob/master/lib/amqp/Vhost.js#L306-L312) from the pool please?

    poolQueue = async.queue((__, next) => {
      debug('Attempting to acquire channel');
      pool.acquire().then((channel) => {
        debug('Channel acquired');
        setImmediate(() => {
          next(null, channel);
        });
      }).catch(next);
    }, 1);
cressie176 commented 3 years ago

You might also try adding a custom pool.use function and logging what happens there. What I find curious is that there are no available resources in the pool, even though the minimum size is 1. It suggests that even though we are requesting resources from the pool, that the pool isn't able to create them for some reason.

matej-prokop commented 3 years ago

Output when having Attempting to acquire channel and Channel acquired debug in place:

...
rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', paused: true, queue: 0, size: 1, available: 0, borrowed: 0, min: 1, max: 5 } +3ms
Got publication
  rascal:tasks:createConnection Obtained connection: bdc30721-4bbf-4a84-9236-2961f60e0560 +1s
  rascal:tasks:createChannel Creating channel +3m
  rascal:tasks:assertExchanges Asserting exchange: events-pipeline-exchange +3m
  rascal:tasks:closeChannel Closing channel +3m
  rascal:Vhost Resuming channel allocation on connection bdc30721-4bbf-4a84-9236-2961f60e0560 +2s
  rascal:Vhost Confirm Channel Stats: { vhost: '/', mode: 'confirm', paused: false, queue: 1, size: 1, available: 0, borrowed: 0, min: 1, max: 5 } +0ms
  rascal:Vhost vhost: / was initialised with connection: bdc30721-4bbf-4a84-9236-2961f60e0560 +0ms
  rascal:Vhost Attempting to acquire channel +1ms
info: Publishing message #776
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', paused: false, queue: 0, size: 1, available: 0, borrowed: 0, min: 1, max: 5 } +28s
Got publication
info: Publishing message #777
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', paused: false, queue: 1, size: 1, available: 0, borrowed: 0, min: 1, max: 5 } +30s
Got publication
info: Publishing message #778
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', paused: false, queue: 2, size: 1, available: 0, borrowed: 0, min: 1, max: 5 } +30s
Got publicationrascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', paused: true, queue: 0, size: 1, available: 0, borrowed: 0, min: 1, max: 5 } +3ms
Got publication
  rascal:tasks:createConnection Obtained connection: bdc30721-4bbf-4a84-9236-2961f60e0560 +1s
  rascal:tasks:createChannel Creating channel +3m
  rascal:tasks:assertExchanges Asserting exchange: events-pipeline-exchange +3m
  rascal:tasks:closeChannel Closing channel +3m
  rascal:Vhost Resuming channel allocation on connection bdc30721-4bbf-4a84-9236-2961f60e0560 +2s
  rascal:Vhost Confirm Channel Stats: { vhost: '/', mode: 'confirm', paused: false, queue: 1, size: 1, available: 0, borrowed: 0, min: 1, max: 5 } +0ms
  rascal:Vhost vhost: / was initialised with connection: bdc30721-4bbf-4a84-9236-2961f60e0560 +0ms
  rascal:Vhost Attempting to acquire channel +1ms
info: Publishing message #776
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', paused: false, queue: 0, size: 1, available: 0, borrowed: 0, min: 1, max: 5 } +28s
Got publication
info: Publishing message #777
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', paused: false, queue: 1, size: 1, available: 0, borrowed: 0, min: 1, max: 5 } +30s
Got publication
info: Publishing message #778
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', paused: false, queue: 2, size: 1, available: 0, borrowed: 0, min: 1, max: 5 } +30s
Got publication
matej-prokop commented 3 years ago

I am not sure how to use pool.use to get any useful information out of it. However, what I find interesting is that both available and borrowed stats for pool are 0. See { vhost: '/', mode: 'confirm', paused: false, queue: 2, size: 1, available: 0, borrowed: 0, min: 1, max: 5 } within logs - because that means that pool isn't ready (according to this line). Therefore, I am assuming that call to acquire a resource is waiting for some available resource (but it is never created)

I just did a quick tour through code of rascal and node-pool and my guess is that pool isn't ready because channel isn't created. I added a few more debug lines and I see that method for creating channel is being called (here) but its callback is never executed (meaning this callback is not executed.

cressie176 commented 3 years ago

Ignore my suggestion about pool.use - looks like it's for something else. Sorry.

I also suspect it's because the channel isn't created. In my (working) output I see the output

rascal:Vhost vhost: / was initialised with connection: 9f014e1d-ea36-43b1-8bc2-8e0ea520a6de +0ms
rascal:Vhost Vhost: / was initialised. Resuming channel creation +0ms

The second of these is because when the connection failed initially, the pool destroyed all broken channels and attempted refill the pool when the connection recovers. The connection recovery is indicated by the vhost_initialised event. When this is emitted the channel will be created. I don't see this output in your logs, and suspect this relates to why the pool is never refilled.

cressie176 commented 3 years ago

Out of curiosity, what node version are you on and what steps are you following to reproduce?

matej-prokop commented 3 years ago

Nodejs: v14.16.0

Steps to reproduce: I have Amazon MQ cluster, then I start my script (I shared parts of it earlier here) that publishes messages (one by one) in loop and then I go to AWS console and use Amazon MQ > Actions > Restart broker. After waiting minute or two I run into the issue

matej-prokop commented 3 years ago

@cressie176 I can create publicly accessible Amazon MQ cluster and share cred with you. Using that you should be able to reproduce the issue on your end. Would it be OK if I share credentials with you via some direct message? (For instance, I can use LinkedIn)

cressie176 commented 3 years ago

Hi @matej-prokop,

That would be very helpful. Maybe use my public gpg key to encrypt the credentials and email them to me. Happy to share email over linkedin if it's not in my public contact info.

matej-prokop commented 3 years ago

👍 OK, you can find code that I use to reproduce the issue here: https://github.com/matej-prokop/rascal-156-repro

(Sending you password via LinkedIn)

matej-prokop commented 3 years ago

Hi @cressie176, updated README in https://github.com/matej-prokop/rascal-156-repro with instructions how to restart Amazon MQ cluster via AWS CLI. So you should be able to follow all steps to reproduce the issue now 🤞

(Sharing access secret for AWS CLI again via DM)

Let me know if that works for you. Thank you!

cressie176 commented 3 years ago

Hi @matej-prokop,

Thanks for your patience and help investigating this issue. I've been able to reproduce the problem using the AWS RabbitMQ broker and can see where the problem lies.

The issue is that the pooled channels are being destroyed before the connection error is handled. Consequently the pool is attempting to recreate them before the connection is set to undefined. As a result, the code attempts to create the channel using a duff connection and never continuing to register the vhost_initialised event listener which is create the channel once the connection has been restored.

I will take some more time to unpick, but thanks to your help at least we understand the cause.

cressie176 commented 3 years ago

Fix on the way. The issue was that when attempting to create the channel, amqplib emitted a close event. This meant the createChannel's callback was never called, and the pool was waiting forever.

I've changed the code to add both close and error event handlers to the connection just prior to requesting a channel. These ensure that the pool always gets a response, and therefore doesn't get stuck.

cressie176 commented 3 years ago

Published as rascal@13.0.4

matej-prokop commented 3 years ago

Big thank you for preparing the fix and publishing new version of rascal so quickly! I have already verified that the fix works for me so I am closing this issue.

BTW: Based on investigation of this issues, I am proposing one minor update: https://github.com/guidesmiths/rascal/pull/159 so please consider if it is something you want to have in rascal