onebeyond / rascal

A config driven wrapper for amqp.node supporting multi-host connections, automatic error recovery, redelivery flood protection, transparent encryption / decryption and channel pooling.
MIT License
451 stars 69 forks source link

Rascal stop sending messages - ResourceRequest timed out #199

Closed thaoula closed 2 years ago

thaoula commented 2 years ago

Hi Team,

Firstly, thanks so much for your library.

Just wondering if you can point me in the right direction with the following error -

TimeoutError: ResourceRequest timed out
    at ResourceRequest._fireTimeout (/home/app/node_modules/generic-pool/lib/ResourceRequest.js:62:17)
    at Timeout.bound (/home/app/node_modules/generic-pool/lib/ResourceRequest.js:8:15)
    at listOnTimeout (node:internal/timers:556:17)
    at processTimers (node:internal/timers:499:7)

We are using version 14.4.0 of Rascal to connect to RabbitMQ 3.10 on Kubernetes. RabbitMQ was deployed using a the RabbitMq Kubernetes operator.

We are currently experiencing a strange issue with one of our customers. Every 5-7 days their application container stops sending messages to all RabbitMQ queues until it is restarted. When this happens, the error above starts appearing in the logs. We have a number of different tenants connected to the same RabbitMQ cluster and they do not appear to be experiencing this issue.

At this stage we are not sure why Generic Pool is raising this error and how to reproduce it on demand.

All we know is that when this starts occurring. Rascal stops sending message to RabbitMQ until we restart the container. All pending message are lost.

Regards, Tarek

cressie176 commented 2 years ago

hi @thaoula,

It looks like this error is emitted by generic-pool when the underlying resource cannot be acquired. In Rascal's case the underlying resource is a channel, which it obtains from the connection to the RabbitMQ server.

There are a couple of things I can think of that might cause this for one client and not others.

  1. The client has configured rascal's pooling options with a timeout which is too low. As you can see from the following generic pool option, by default there is no timeout, however Rascal applies it's own default of 15s

acquireTimeoutMillis: max milliseconds an acquire call will wait for a resource before timing out. (default no limit), if supplied should non-zero positive integer.

  1. Maybe the client is using more channels that available to them. I'm not sure off the top of my head whether this would cause the create channel request to block or error (I guess the latter), and then not sure what generic pool would do with the error. Maybe it retries without starting the timeout.

One thing I did notice though was what looks like a bug in the Rascal code base. I'm pretty sure that the 5000 was supposed to be for a 5s timeout rather than a value passed to the next callback. I believe I needed to add this to workaround an issue around the pool not clearing on shutdown.

I'll try to take a look over the next few days and reproduce, but for now, check the acquireTimeoutMillis setting

cressie176 commented 2 years ago

I've removed the code that mistakenly tried to set the timeout if the vhost wasn't initialised when the pool attempted to create a channel. This just duplicated generic-pool's timeout so was pointless and confusing. However I do not believe this is likely to cause the problem you describe above. My best guess is either it's taking an unreasonable time to obtain a connect / channel, or the acquireTimeoutMillis has been set too low.

If it is taking an unreasonable time to obtain a connection or channel, reasons could be...

thaoula commented 2 years ago

Hi @cressie176,

Thanks for your responses.

As far as I can see we are not setting acquireTimeoutMillis in the code or any other pool related configuration. Other than bindings, publications, exchanges we are using withDefaultConfig

 const vhost = {
            assert: true,
            connectionStrategy: 'random',
            connections: getConnections(configuration),
            exchanges: exchanges,
            queues: queues,
            bindings: bindings,
        } as VhostConfig;

        config.vhosts[environment] = vhost;

        // Create configu with sensible defaults
        this.config = withDefaultConfig(config);

With regards to your points -

Network Infrastructure - Everything is in running within a single Kubernetes cluster and it appears other connected tenants are fine. Other networking functionality like client requests, mobile apps etc do not report any errors.

RabbitMQ - From what we can see the message counts are relatively low .. like less than 40 per hour.

The event loop being saturated - This might be something, the strange thing though, is that other aspects of the application continue working without slowness or error.. ie .. users can continue to create, edit and delete records .. The customer only notices after their customers complain that they are not receiving emails and notifications.

As mentioned earlier, the issue seems to be Tenant specific as we have not been required to restart the RabbitMQ or Kubernetes hosts. We only restart the tenant container? Do you know if there are specific events I should handle to try and diagnose this issue?

Regards, Tarek

cressie176 commented 2 years ago

I'm learning a bit more about generic-pool...

When generic-pool initialises it ignores the acquireTimeoutMillis setting and so will never emit the timeout when the pool populates for the first time. If the call to create the channel yields an error, generic-pool emits a factoryCreateError event which will be logged via debug by rascal. generic-pool never retries to create the channel. Consequently you could end up starting with an empty pool.

If the pool does start empty, when your application attempts to publish a message, generic-pool will attempt to create a channel then, and this is the code path which can emit timeout events. These are translated into promise rejections, and passed to rascal. Rascal emits them as error events via the publication, and I suspect that is how you are eventually seeing the TimeoutError: ResourceRequest timed out.

When the timeout occurs, this doesn't actually cancel the create channel request, so it may eventually complete, using up one of your RabbitMQ channel allocations. By default Rascal sets this to 100. Attempts to request more than this yield an immediate and obvious error, so I don't think you are hitting this.

Still none the wiser as to what exactly is happening in your environment though. You could try running with DEBUG enabled and see what light that sheds...

DEBUG='rascal:Vhost'

cressie176 commented 2 years ago

Thinking about the above and testing various scenarios, things seem to be working as I would expect.

generic-pool makes best efforts to fill the pool on startup, but doesn't block until the pool is full, and tolerates channel creation errors. When publishing a message, if there isn't an available channel, and if the pool has capacity, a new one is created. If the creation times out, the publish operation fails.

The only gotcha I can see with this is that the channel may be created after the timeout has been reported. Because the channel pool promise has already rejected, we could be left with an open, but unusable channel.

This still doesn't look like your issue though!

cressie176 commented 2 years ago

I've tested for the channel leak described above, and it doesn't happen. It looks like the create channel requests are queued, and even in the event of a timeout, when the create channel promise eventually resolves it is added to the pool.

From everything I can see, generic-pool and rascal are behaving as I would expect. In addition to debug, are you able to use something like Wireshark to debug the communication between the application and broker?

cressie176 commented 2 years ago

It might have been helpful if I started with this...

When Rascal publishes a message, it requests either a regular channel or a confirm channel from the relevant pool via an in memory queue so that when the vhost is initialising, recovering or terminating, channel allocation can be temporarily paused.

When rascal detects a connection error, it pauses the queue and reinitialises the vhost. Some connection errors are rude and obvious, but others are only detected because amqplib does not receive a heartbeat from the RabbitMQ broker within a defined time period (Rascal's default heartbeat of 10s means the timeout is ~30s). Between the connection dying and it being detected, there is a window where the queue will be running, but create channel requests will not yield a response.

If the pool decided to create a new channel during this window (e.g. because it had fewer than the minimum number of channels, or because there were non available and it has fewer than the maximum number of channels), then it will call createChannelWhenInitialised. If Rascal believes the connection is live, it will create a new channel, however if the connection was actually dead, the call will not yield until amqplib reports the dropped connection via a heartbeat timeout error

If the heartbeat timeout error takes longer than the pool's acquireTimeoutMillis, the pool will timeout frist by emitting a TimeoutError: ResourceRequest timed out. This manifests as a factoryCreateError and will be logged by Rascal as debug. Such errors do not break the pool, and after the connection recovers it will be able to create new channels again.

It is also possible that the pool may decide to create a channel just as the connection error is detected. In this case, Rascal defers the call to create the channel until after the vhost has finished re-initialising. If the vhost takes longer than the pool's acquireTimeoutMillis, it will again emit the timeout error (TimeoutError: ResourceRequest timed out), but recovers as soon as the connection has been re-established.

If tested all the above using the simple example and by pausing and unpausing the docker container.

One thought occurs - has your client disabled the heartbeats, or made them extremely long? This is configured using the connection parameters, and might explain the behaviour you are seeing.

cressie176 commented 2 years ago

I can reproduce using the simple example if I make the heartbeats 999999 instead of 10 and pause the docker container running RabbitMQ.

With a high heartbeat, amqplib does not detect the dropped connection and Rascal never pauses the pool queue. The pool continues to make new channel requests to the dead connection, which of course never responds. Consequently the pool times out.

FYI it is only possible to disable heartbeats directly if both the client and server value is set to 0. If only the client or server heartbeat is set to 0, the non-zero value will be used. If both client and server have a non-zero value, the lower value is used.

cressie176 commented 2 years ago

Any update @thaoula?

thaoula commented 2 years ago

Hi @cressie176,

Thanks for checking back.

It seems we are still having this issue with the same customer. It happened yesterday and we still have been unable to find the cause.

Prometheus monitoring of the container does not show any clear indication of networking or load issues.

I dot believe we have done anything with the heartbeat. Our connection config is as follows -


            return {
                hostname: hostname,
                port: port,
                user: username,
                password: password,
                options: {
                    heartbeat: 10,
                },
                retry: {
                    min: 1000,
                    max: 300000,
                    factor: 2,
                    strategy: "exponential"
                },
                socketOptions: {
                    clientProperties: {
                        connection_name: connectionName.toLowerCase()
                    }
                }
            } as ConnectionConfig;

What do you mean by the following -

Still none the wiser as to what exactly is happening in your environment though. You could try running with DEBUG enabled and see what light that sheds...

DEBUG='rascal:Vhost'

Is that debug an environment variable? Should i set it literally? or do i need to replace the Vhost with the name of the Vhost?

Regards, Tarek

cressie176 commented 2 years ago

Hi @thaoula,

Yes, DEBUG is an environment variable. It's used by the debug library to selectively enable low level logging. You can set it to rascal:* to turn on all rascal debug, or rascal:Vhost to enable just the debug for the Vhost module which manages the publisher connections. You can also use a comma separated list to selectively enable multiple modules using the selectors specified when debug is initialised, e.g. rascal:Vhost,rascal:tasks:createConnection

Obviously since the issue takes a few days to manifest, it will increase the amount of output the application produces, and it isn't formatted in a machine readable way like JSON. Here's an example trace with annotations

DEBUG='rascal:Vhost,rascal:tasks:createConnection' node index.js
  rascal:Vhost Initialising vhost: / +0ms
  rascal:tasks:createConnection Connecting to broker using url: amqp://guest:***@localhost:5672?heartbeat=10&connection_timeout=10000&channelMax=100 +0ms
  rascal:tasks:createConnection Obtained connection: 3fbb8010-3f59-4730-acba-8d2fcffdabdf +19ms
  rascal:Vhost Creating regular channel pool { autostart: false, max: 5, min: 1, evictionRunIntervalMillis: 10000, idleTimeoutMillis: 60000, rejectionDelayMillis: 1000, testOnBorrow: true, acquireTimeoutMillis: 15000, destroyTimeoutMillis: 1000 } +35ms
  rascal:Vhost Creating confirm channel pool { autostart: true, max: 5, min: 1, evictionRunIntervalMillis: 10000, idleTimeoutMillis: 60000, rejectionDelayMillis: 1000, testOnBorrow: true, acquireTimeoutMillis: 15000, destroyTimeoutMillis: 1000 } +2ms
  rascal:Vhost Creating pooled confirm channel for vhost: / +0ms
  rascal:Vhost vhost: / was initialised with connection: 3fbb8010-3f59-4730-acba-8d2fcffdabdf +1ms
  rascal:Vhost Requested channel. Outstanding channel requests: 1 +2ms
  rascal:Vhost Created confirm channel: 91f5b065-c986-49a9-b121-a43b93dc4cb4 from connection: 3fbb8010-3f59-4730-acba-8d2fcffdabdf +4ms
  rascal:Vhost Created confirm channel: 7a39ec29-4899-4439-9956-99c01b1eedac from connection: 3fbb8010-3f59-4730-acba-8d2fcffdabdf +0ms
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 0, size: 1, available: 1, borrowed: 0, min: 1, max: 5 } +999ms
  rascal:Vhost Borrowed confirm channel: 91f5b065-c986-49a9-b121-a43b93dc4cb4. { vhost: '/', mode: 'confirm', queue: 0, size: 1, available: 0, borrowed: 1, min: 1, max: 5 } +2ms
  rascal:Vhost Releasing confirm channel: 91f5b065-c986-49a9-b121-a43b93dc4cb4. { vhost: '/', mode: 'confirm', queue: 0, size: 1, available: 0, borrowed: 1, min: 1, max: 5 } +3ms
2022-05-28T04:51:51.993Z: hello world
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 0, size: 1, available: 1, borrowed: 0, min: 1, max: 5 } +996ms
  rascal:Vhost Borrowed confirm channel: 91f5b065-c986-49a9-b121-a43b93dc4cb4. { vhost: '/', mode: 'confirm', queue: 0, size: 1, available: 0, borrowed: 1, min: 1, max: 5 } +1ms
  rascal:Vhost Releasing confirm channel: 91f5b065-c986-49a9-b121-a43b93dc4cb4. { vhost: '/', mode: 'confirm', queue: 0, size: 1, available: 0, borrowed: 1, min: 1, max: 5 } +3ms
2022-05-28T04:51:52.995Z: hello world
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 0, size: 1, available: 1, borrowed: 0, min: 1, max: 5 } +1s
  rascal:Vhost Borrowed confirm channel: 91f5b065-c986-49a9-b121-a43b93dc4cb4. { vhost: '/', mode: 'confirm', queue: 0, size: 1, available: 0, borrowed: 1, min: 1, max: 5 } +1ms
2022-05-28T04:51:54.000Z: hello world
  rascal:Vhost Releasing confirm channel: 91f5b065-c986-49a9-b121-a43b93dc4cb4. { vhost: '/', mode: 'confirm', queue: 0, size: 1, available: 0, borrowed: 1, min: 1, max: 5 } +4ms
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 0, size: 1, available: 1, borrowed: 0, min: 1, max: 5 } +1s
  rascal:Vhost Borrowed confirm channel: 91f5b065-c986-49a9-b121-a43b93dc4cb4. { vhost: '/', mode: 'confirm', queue: 0, size: 1, available: 0, borrowed: 1, min: 1, max: 5 } +1ms
2022-05-28T04:51:55.005Z: hello world
  rascal:Vhost Releasing confirm channel: 91f5b065-c986-49a9-b121-a43b93dc4cb4. { vhost: '/', mode: 'confirm', queue: 0, size: 1, available: 0, borrowed: 1, min: 1, max: 5 } +2ms

Here's roughly when I paused the docker container

  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 0, size: 1, available: 1, borrowed: 0, min: 1, max: 5 } +1s
  rascal:Vhost Borrowed confirm channel: 91f5b065-c986-49a9-b121-a43b93dc4cb4. { vhost: '/', mode: 'confirm', queue: 0, size: 1, available: 0, borrowed: 1, min: 1, max: 5 } +0ms    
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 0, size: 1, available: 0, borrowed: 1, min: 1, max: 5 } +1s

The pool needs to grow because channel 91f5b065-c986-49a9-b121-a43b93dc4cb4 cannot be released until the previous publish is confirmed

  rascal:Vhost Creating pooled confirm channel for vhost: / +0ms

Subsequent requests to the pool are queued. See here

  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 0, size: 2, available: 0, borrowed: 1, min: 1, max: 5 } +1s
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 1, size: 2, available: 0, borrowed: 1, min: 1, max: 5 } +1s
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 2, size: 2, available: 0, borrowed: 1, min: 1, max: 5 } +1s
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 3, size: 2, available: 0, borrowed: 1, min: 1, max: 5 } +1s
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 4, size: 2, available: 0, borrowed: 1, min: 1, max: 5 } +1s
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 5, size: 2, available: 0, borrowed: 1, min: 1, max: 5 } +1s
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 6, size: 2, available: 0, borrowed: 1, min: 1, max: 5 } +1s
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 7, size: 2, available: 0, borrowed: 1, min: 1, max: 5 } +1s

The publisher has timed out after waiting for the confirmation and attempts to destroy channel 91f5b065-c986-49a9-b121-a43b93dc4cb4

  rascal:Vhost Destroying confirm channel: 91f5b065-c986-49a9-b121-a43b93dc4cb4. { vhost: '/', mode: 'confirm', queue: 8, size: 2, available: 0, borrowed: 1, min: 1, max: 5 } +991ms
  rascal:Vhost Destroying confirm channel: 91f5b065-c986-49a9-b121-a43b93dc4cb4 for vhost: / +0ms
Error: Timedout after 10000ms waiting for broker to confirm publication to: demo_ex
    at Timeout._onTimeout (/Users/steve/Development/guidesmiths/rascal/lib/amqp/Publication.js:216:10)
    at listOnTimeout (internal/timers.js:554:17)
    at processTimers (internal/timers.js:497:7) ddb80209-58fd-4b37-8d38-fef3b59ffd4b
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 8, size: 1, available: 0, borrowed: 0, min: 1, max: 5 } +12ms
  rascal:Vhost Destroy error emitted by confirm channel pool: Timeout after 1000ms closing confirm channel: 91f5b065-c986-49a9-b121-a43b93dc4cb4 for vhost: / +994ms
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 9, size: 1, available: 0, borrowed: 0, min: 1, max: 5 } +7ms
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 10, size: 1, available: 0, borrowed: 0, min: 1, max: 5 } +1s
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 11, size: 1, available: 0, borrowed: 0, min: 1, max: 5 } +1s
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 12, size: 1, available: 0, borrowed: 0, min: 1, max: 5 } +1s
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 13, size: 1, available: 0, borrowed: 0, min: 1, max: 5 } +1s

The channel pool has timedout after waiting for the channel to be created

TimeoutError: ResourceRequest timed out
    at ResourceRequest._fireTimeout (/Users/steve/Development/guidesmiths/rascal/node_modules/generic-pool/lib/ResourceRequest.js:62:17)
    at Timeout.bound (/Users/steve/Development/guidesmiths/rascal/node_modules/generic-pool/lib/ResourceRequest.js:8:15)
    at listOnTimeout (internal/timers.js:554:17)
    at processTimers (internal/timers.js:497:7) 439c932d-0fed-4d8c-854e-9968f2d9967a    
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 13, size: 1, available: 0, borrowed: 0, min: 1, max: 5 } +1s
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 14, size: 1, available: 0, borrowed: 0, min: 1, max: 5 } +1s
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 15, size: 1, available: 0, borrowed: 0, min: 1, max: 5 } +1s
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 16, size: 1, available: 0, borrowed: 0, min: 1, max: 5 } +1s
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 17, size: 1, available: 0, borrowed: 0, min: 1, max: 5 } +1s
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 18, size: 1, available: 0, borrowed: 0, min: 1, max: 5 } +1s
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 19, size: 1, available: 0, borrowed: 0, min: 1, max: 5 } +1s
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 20, size: 1, available: 0, borrowed: 0, min: 1, max: 5 } +1s
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 21, size: 1, available: 0, borrowed: 0, min: 1, max: 5 } +1s

amqplib has timed out after waiting for the heartbeat, triggering Rascal to attempt to reconnect

  rascal:Vhost Handling connection error: Heartbeat timeout initially from connection: 3fbb8010-3f59-4730-acba-8d2fcffdabdf, amqp://guest:***@localhost:5672?heartbeat=10&connection_timeout=10000&channelMax=100 +939ms
Error: Heartbeat timeout
    at Heart.<anonymous> (/Users/steve/Development/guidesmiths/rascal/node_modules/amqplib/lib/connection.js:435:19)
    at Heart.emit (events.js:314:20)
    at Heart.runHeartbeat (/Users/steve/Development/guidesmiths/rascal/node_modules/amqplib/lib/heartbeat.js:88:17)
    at listOnTimeout (internal/timers.js:554:17)
    at processTimers (internal/timers.js:497:7) {
  vhost: '/',
  connectionUrl: 'amqp://guest:***@localhost:5672?heartbeat=10&connection_timeout=10000&channelMax=100'
}
  rascal:Vhost Initialising vhost: / +1ms
  rascal:tasks:createConnection Connecting to broker using url: amqp://guest:***@localhost:5672?heartbeat=10&connection_timeout=10000&channelMax=100 +30s

Even though the pool had timed out waiting for the channel to be created, it handles eventual success or failure. It will need to create another channel.

Error: Heartbeat timeout
    at Heart.<anonymous> (/Users/steve/Development/guidesmiths/rascal/node_modules/amqplib/lib/connection.js:435:19)
    at Heart.emit (events.js:314:20)
    at Heart.runHeartbeat (/Users/steve/Development/guidesmiths/rascal/node_modules/amqplib/lib/heartbeat.js:88:17)
    at listOnTimeout (internal/timers.js:554:17)
    at processTimers (internal/timers.js:497:7) 
  rascal:Vhost Error creating channel: 78bc674f-1216-42fd-8dde-e6026ef08d77 from amqp://guest:***@localhost:5672?heartbeat=10&connection_timeout=10000&channelMax=100: Channel ended, no reply will be forthcoming +2ms
  rascal:Vhost Requested channel. Outstanding channel requests: 1 +0ms
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 22, size: 1, available: 0, borrowed: 0, min: 1, max: 5 } +58ms
  rascal:Vhost Create error emitted by confirm channel pool: Heartbeat timeout +941ms
  rascal:Vhost Creating pooled confirm channel for vhost: / +0ms

Since the vhost has not finished initialising the channel request will be deferred

  rascal:Vhost Vhost: / is not initialised. Deferring channel creation +0ms
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 23, size: 1, available: 0, borrowed: 0, min: 1, max: 5 } +61ms
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 24, size: 1, available: 0, borrowed: 0, min: 1, max: 5 } +1s
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 25, size: 1, available: 0, borrowed: 0, min: 1, max: 5 } +1s
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 26, size: 1, available: 0, borrowed: 0, min: 1, max: 5 } +999ms
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 27, size: 1, available: 0, borrowed: 0, min: 1, max: 5 } +1s

Again the pool has timed out waiting for the new channel to be created

TimeoutError: ResourceRequest timed out
    at ResourceRequest._fireTimeout (/Users/steve/Development/guidesmiths/rascal/node_modules/generic-pool/lib/ResourceRequest.js:62:17)
    at Timeout.bound (/Users/steve/Development/guidesmiths/rascal/node_modules/generic-pool/lib/ResourceRequest.js:8:15)
    at listOnTimeout (internal/timers.js:554:17)
    at processTimers (internal/timers.js:497:7) 0634e5ac-7338-40ad-bd78-00ac8cc12c52
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 28, size: 1, available: 0, borrowed: 0, min: 1, max: 5 } +1s
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 29, size: 1, available: 0, borrowed: 0, min: 1, max: 5 } +1s
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 30, size: 1, available: 0, borrowed: 0, min: 1, max: 5 } +1s
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 31, size: 1, available: 0, borrowed: 0, min: 1, max: 5 } +1s

The socket has timed out waiting for the reconnection. Rascal will attempt to reconnect after a short delay

  rascal:Vhost Will attempt reconnection in in 1448ms +933ms
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 32, size: 1, available: 0, borrowed: 0, min: 1, max: 5 } +68ms
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 33, size: 1, available: 0, borrowed: 0, min: 1, max: 5 } +1s
  rascal:Vhost Handling connection error: Failed to connect to: amqp://guest:***@localhost:5672?heartbeat=10&connection_timeout=10000&channelMax=100. Original message was: connect ETIMEDOUT initially from connection: 3fbb8010-3f59-4730-acba-8d2fcffdabdf, amqp://guest:***@localhost:5672?heartbeat=10&connection_timeout=10000&channelMax=100 +376ms
Error: Failed to connect to: amqp://guest:***@localhost:5672?heartbeat=10&connection_timeout=10000&channelMax=100. Original message was: connect ETIMEDOUT
    at Socket.<anonymous> (/Users/steve/Development/guidesmiths/rascal/node_modules/amqplib/lib/connect.js:178:20)
    at Object.onceWrapper (events.js:420:28)
    at Socket.emit (events.js:314:20)
    at Socket._onTimeout (net.js:483:8)
    at listOnTimeout (internal/timers.js:554:17)
    at processTimers (internal/timers.js:497:7) {
  vhost: '/',
  connectionUrl: 'amqp://guest:***@localhost:5672?heartbeat=10&connection_timeout=10000&channelMax=100'
}
  rascal:Vhost Initialising vhost: / +1ms
  rascal:tasks:createConnection Connecting to broker using url: amqp://guest:***@localhost:5672?heartbeat=10&connection_timeout=10000&channelMax=100 +11s
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 34, size: 1, available: 0, borrowed: 0, min: 1, max: 5 } +626ms
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 35, size: 1, available: 0, borrowed: 0, min: 1, max: 5 } +1s
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 36, size: 1, available: 0, borrowed: 0, min: 1, max: 5 } +1s
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 37, size: 1, available: 0, borrowed: 0, min: 1, max: 5 } +1s
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 38, size: 1, available: 0, borrowed: 0, min: 1, max: 5 } +1s
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 39, size: 1, available: 0, borrowed: 0, min: 1, max: 5 } +1s
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 40, size: 1, available: 0, borrowed: 0, min: 1, max: 5 } +1s
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 41, size: 1, available: 0, borrowed: 0, min: 1, max: 5 } +1s
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 42, size: 1, available: 0, borrowed: 0, min: 1, max: 5 } +1s
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 43, size: 1, available: 0, borrowed: 0, min: 1, max: 5 } +1s

This reconnection also fails due to a socket timeout

  rascal:Vhost Will attempt reconnection in in 3942ms +352ms
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 44, size: 1, available: 0, borrowed: 0, min: 1, max: 5 } +650ms
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 45, size: 1, available: 0, borrowed: 0, min: 1, max: 5 } +1s
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 46, size: 1, available: 0, borrowed: 0, min: 1, max: 5 } +1s
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 47, size: 1, available: 0, borrowed: 0, min: 1, max: 5 } +1s
  rascal:Vhost Handling connection error: Failed to connect to: amqp://guest:***@localhost:5672?heartbeat=10&connection_timeout=10000&channelMax=100. Original message was: connect ETIMEDOUT initially from connection: 3fbb8010-3f59-4730-acba-8d2fcffdabdf, amqp://guest:***@localhost:5672?heartbeat=10&connection_timeout=10000&channelMax=100 +292ms
Error: Failed to connect to: amqp://guest:***@localhost:5672?heartbeat=10&connection_timeout=10000&channelMax=100. Original message was: connect ETIMEDOUT
    at Socket.<anonymous> (/Users/steve/Development/guidesmiths/rascal/node_modules/amqplib/lib/connect.js:178:20)
    at Object.onceWrapper (events.js:420:28)
    at Socket.emit (events.js:314:20)
    at Socket._onTimeout (net.js:483:8)
    at listOnTimeout (internal/timers.js:554:17)
    at processTimers (internal/timers.js:497:7) {
  vhost: '/',
  connectionUrl: 'amqp://guest:***@localhost:5672?heartbeat=10&connection_timeout=10000&channelMax=100'
}

Here is where I unpaused the broker, allowing the reconnection to work. Because of the backlog of messages, the pool grows to its maximum size and messages start flowing again


  rascal:Vhost Initialising vhost: / +1ms
  rascal:tasks:createConnection Connecting to broker using url: amqp://guest:***@localhost:5672?heartbeat=10&connection_timeout=10000&channelMax=100 +14s
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 48, size: 1, available: 0, borrowed: 0, min: 1, max: 5 } +707ms
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 49, size: 1, available: 0, borrowed: 0, min: 1, max: 5 } +1s
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 50, size: 1, available: 0, borrowed: 0, min: 1, max: 5 } +1s
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 51, size: 1, available: 0, borrowed: 0, min: 1, max: 5 } +1s
  rascal:Vhost Requested confirm channel. { vhost: '/', mode: 'confirm', queue: 52, size: 1, available: 0, borrowed: 0, min: 1, max: 5 } +1s
  rascal:tasks:createConnection Obtained connection: 8ac5688c-fa04-4fb2-8feb-73b59a78444a +5s
  rascal:Vhost vhost: / was initialised with connection: 8ac5688c-fa04-4fb2-8feb-73b59a78444a +343ms
  rascal:Vhost Vhost: / was initialised. Resuming channel creation +0ms
  rascal:Vhost Created confirm channel: 4ff97b23-2cbb-4c9e-b483-d37496c448a0 from connection: 8ac5688c-fa04-4fb2-8feb-73b59a78444a +3ms
  rascal:Vhost Borrowed confirm channel: 4ff97b23-2cbb-4c9e-b483-d37496c448a0. { vhost: '/', mode: 'confirm', queue: 52, size: 1, available: 0, borrowed: 1, min: 1, max: 5 } +1ms
  rascal:Vhost Creating pooled confirm channel for vhost: / +0ms
  rascal:Vhost Created confirm channel: 973de147-d972-45e8-b57b-a4f5773ff699 from connection: 8ac5688c-fa04-4fb2-8feb-73b59a78444a +0ms
  rascal:Vhost Releasing confirm channel: 4ff97b23-2cbb-4c9e-b483-d37496c448a0. { vhost: '/', mode: 'confirm', queue: 51, size: 2, available: 0, borrowed: 1, min: 1, max: 5 } +2ms
  rascal:Vhost Borrowed confirm channel: 4ff97b23-2cbb-4c9e-b483-d37496c448a0. { vhost: '/', mode: 'confirm', queue: 51, size: 2, available: 0, borrowed: 1, min: 1, max: 5 } +0ms
  rascal:Vhost Created confirm channel: 363e196f-2ce9-4fae-8fcb-b7fc22b80ad1 from connection: 8ac5688c-fa04-4fb2-8feb-73b59a78444a +1ms
  rascal:Vhost Borrowed confirm channel: 363e196f-2ce9-4fae-8fcb-b7fc22b80ad1. { vhost: '/', mode: 'confirm', queue: 50, size: 2, available: 0, borrowed: 2, min: 1, max: 5 } +0ms
  rascal:Vhost Creating pooled confirm channel for vhost: / +0ms
  rascal:Vhost Releasing confirm channel: 4ff97b23-2cbb-4c9e-b483-d37496c448a0. { vhost: '/', mode: 'confirm', queue: 49, size: 3, available: 0, borrowed: 2, min: 1, max: 5 } +1ms
  rascal:Vhost Borrowed confirm channel: 4ff97b23-2cbb-4c9e-b483-d37496c448a0. { vhost: '/', mode: 'confirm', queue: 49, size: 3, available: 0, borrowed: 2, min: 1, max: 5 } +0ms
2022-05-28T04:51:56.009Z: hello world
2022-05-28T04:51:59.010Z: hello world
2022-05-28T04:52:00.012Z: hello world
  rascal:Vhost Releasing confirm channel: 363e196f-2ce9-4fae-8fcb-b7fc22b80ad1. { vhost: '/', mode: 'confirm', queue: 48, size: 3, available: 0, borrowed: 2, min: 1, max: 5 } +1ms
2022-05-28T04:52:01.011Z: hello world
  rascal:Vhost Borrowed confirm channel: 363e196f-2ce9-4fae-8fcb-b7fc22b80ad1. { vhost: '/', mode: 'confirm', queue: 48, size: 3, available: 0, borrowed: 2, min: 1, max: 5 } +1ms
  rascal:Vhost Releasing confirm channel: 4ff97b23-2cbb-4c9e-b483-d37496c448a0. { vhost: '/', mode: 'confirm', queue: 47, size: 3, available: 0, borrowed: 2, min: 1, max: 5 } +1ms
  rascal:Vhost Borrowed confirm channel: 4ff97b23-2cbb-4c9e-b483-d37496c448a0. { vhost: '/', mode: 'confirm', queue: 47, size: 3, available: 0, borrowed: 2, min: 1, max: 5 } +0ms
2022-05-28T04:52:02.012Z: hello world
  rascal:Vhost Created confirm channel: 08f16873-2f4a-41a3-b3f4-7c570f511e02 from connection: 8ac5688c-fa04-4fb2-8feb-73b59a78444a +1ms
  rascal:Vhost Releasing confirm channel: 363e196f-2ce9-4fae-8fcb-b7fc22b80ad1. { vhost: '/', mode: 'confirm', queue: 46, size: 3, available: 0, borrowed: 2, min: 1, max: 5 } +0ms
2022-05-28T04:52:03.012Z: hello world
  rascal:Vhost Borrowed confirm channel: 363e196f-2ce9-4fae-8fcb-b7fc22b80ad1. { vhost: '/', mode: 'confirm', queue: 46, size: 3, available: 1, borrowed: 2, min: 1, max: 5 } +0ms
  rascal:Vhost Borrowed confirm channel: 08f16873-2f4a-41a3-b3f4-7c570f511e02. { vhost: '/', mode: 'confirm', queue: 45, size: 3, available: 0, borrowed: 3, min: 1, max: 5 } +0ms
  rascal:Vhost Creating pooled confirm channel for vhost: / +1ms
  rascal:Vhost Releasing confirm channel: 4ff97b23-2cbb-4c9e-b483-d37496c448a0. { vhost: '/', mode: 'confirm', queue: 44, size: 4, available: 0, borrowed: 3, min: 1, max: 5 } +0ms
2022-05-28T04:52:04.017Z: hello world
2022-05-28T04:52:05.020Z: hello world
  rascal:Vhost Releasing confirm channel: 363e196f-2ce9-4fae-8fcb-b7fc22b80ad1. { vhost: '/', mode: 'confirm', queue: 44, size: 4, available: 0, borrowed: 2, min: 1, max: 5 } +1ms
  rascal:Vhost Borrowed confirm channel: 4ff97b23-2cbb-4c9e-b483-d37496c448a0. { vhost: '/', mode: 'confirm', queue: 44, size: 4, available: 1, borrowed: 2, min: 1, max: 5 } +0ms
  rascal:Vhost Releasing confirm channel: 08f16873-2f4a-41a3-b3f4-7c570f511e02. { vhost: '/', mode: 'confirm', queue: 43, size: 4, available: 0, borrowed: 3, min: 1, max: 5 } +1ms
2022-05-28T04:52:06.023Z: hello world
  rascal:Vhost Borrowed confirm channel: 363e196f-2ce9-4fae-8fcb-b7fc22b80ad1. { vhost: '/', mode: 'confirm', queue: 43, size: 4, available: 1, borrowed: 2, min: 1, max: 5 } +1ms
  rascal:Vhost Borrowed confirm channel: 08f16873-2f4a-41a3-b3f4-7c570f511e02. { vhost: '/', mode: 'confirm', queue: 42, size: 4, available: 0, borrowed: 3, min: 1, max: 5 } +0ms
  rascal:Vhost Created confirm channel: 219e90b1-b4dd-47a4-98f7-051f61982041 from connection: 8ac5688c-fa04-4fb2-8feb-73b59a78444a +1ms
  rascal:Vhost Releasing confirm channel: 4ff97b23-2cbb-4c9e-b483-d37496c448a0. { vhost: '/', mode: 'confirm', queue: 41, size: 4, available: 0, borrowed: 3, min: 1, max: 5 } +0ms
  rascal:Vhost Borrowed confirm channel: 4ff97b23-2cbb-4c9e-b483-d37496c448a0. { vhost: '/', mode: 'confirm', queue: 41, size: 4, available: 1, borrowed: 3, min: 1, max: 5 } +0ms
2022-05-28T04:52:07.023Z: hello world
  rascal:Vhost Releasing confirm channel: 363e196f-2ce9-4fae-8fcb-b7fc22b80ad1. { vhost: '/', mode: 'confirm', queue: 40, size: 4, available: 0, borrowed: 4, min: 1, max: 5 } +0ms
2022-05-28T04:52:08.025Z: hello world
2022-05-28T04:52:09.025Z: hello world
  rascal:Vhost Borrowed confirm channel: 219e90b1-b4dd-47a4-98f7-051f61982041. { vhost: '/', mode: 'confirm', queue: 40, size: 4, available: 1, borrowed: 3, min: 1, max: 5 } +1ms
  rascal:Vhost Releasing confirm channel: 08f16873-2f4a-41a3-b3f4-7c570f511e02. { vhost: '/', mode: 'confirm', queue: 39, size: 4, available: 0, borrowed: 4, min: 1, max: 5 } +0ms
  rascal:Vhost Borrowed confirm channel: 363e196f-2ce9-4fae-8fcb-b7fc22b80ad1. { vhost: '/', mode: 'confirm', queue: 39, size: 4, available: 1, borrowed: 3, min: 1, max: 5 } +0ms
  rascal:Vhost Borrowed confirm channel: 08f16873-2f4a-41a3-b3f4-7c570f511e02. { vhost: '/', mode: 'confirm', queue: 38, size: 4, available: 0, borrowed: 4, min: 1, max: 5 } +1ms
  rascal:Vhost Creating pooled confirm channel for vhost: / +0ms
  rascal:Vhost Releasing confirm channel: 4ff97b23-2cbb-4c9e-b483-d37496c448a0. { vhost: '/', mode: 'confirm', queue: 37, size: 5, available: 0, borrowed: 4, min: 1, max: 5 } +0ms
2022-05-28T04:52:10.025Z: hello world
  rascal:Vhost Borrowed confirm channel: 4ff97b23-2cbb-4c9e-b483-d37496c448a0. { vhost: '/', mode: 'confirm', queue: 37, size: 5, available: 0, borrowed: 4, min: 1, max: 5 } +1ms
2022-05-28T04:52:11.025Z: hello world
  rascal:Vhost Releasing confirm channel: 219e90b1-b4dd-47a4-98f7-051f61982041. { vhost: '/', mode: 'confirm', queue: 36, size: 5, available: 0, borrowed: 4, min: 1, max: 5 } +0ms
  rascal:Vhost Borrowed confirm channel: 219e90b1-b4dd-47a4-98f7-051f61982041. { vhost: '/', mode: 'confirm', queue: 36, size: 5, available: 0, borrowed: 4, min: 1, max: 5 } +0ms
  rascal:Vhost Releasing confirm channel: 363e196f-2ce9-4fae-8fcb-b7fc22b80ad1. { vhost: '/', mode: 'confirm', queue: 35, size: 5, available: 0, borrowed: 4, min: 1, max: 5 } +1ms
2022-05-28T04:52:12.026Z: hello world
  rascal:Vhost Borrowed confirm channel: 363e196f-2ce9-4fae-8fcb-b7fc22b80ad1. { vhost: '/', mode: 'confirm', queue: 35, size: 5, available: 0, borrowed: 4, min: 1, max: 5 } +0ms
  rascal:Vhost Releasing confirm channel: 08f16873-2f4a-41a3-b3f4-7c570f511e02. { vhost: '/', mode: 'confirm', queue: 34, size: 5, available: 0, borrowed: 4, min: 1, max: 5 } +0ms
2022-05-28T04:52:13.032Z: hello world
  rascal:Vhost Borrowed confirm channel: 08f16873-2f4a-41a3-b3f4-7c570f511e02. { vhost: '/', mode: 'confirm', queue: 34, size: 5, available: 0, borrowed: 4, min: 1, max: 5 } +1ms
  rascal:Vhost Releasing confirm channel: 4ff97b23-2cbb-4c9e-b483-d37496c448a0. { vhost: '/', mode: 'confirm', queue: 33, size: 5, available: 0, borrowed: 4, min: 1, max: 5 } +0ms
2022-05-28T04:52:14.034Z: hello world
  rascal:Vhost Borrowed confirm channel: 4ff97b23-2cbb-4c9e-b483-d37496c448a0. { vhost: '/', mode: 'confirm', queue: 33, size: 5, available: 0, borrowed: 4, min: 1, max: 5 } +0ms
2022-05-28T04:52:15.035Z: hello world
  rascal:Vhost Releasing confirm channel: 219e90b1-b4dd-47a4-98f7-051f61982041. { vhost: '/', mode: 'confirm', queue: 32, size: 5, available: 0, borrowed: 4, min: 1, max: 5 } +1ms
  rascal:Vhost Borrowed confirm channel: 219e90b1-b4dd-47a4-98f7-051f61982041. { vhost: '/', mode: 'confirm', queue: 32, size: 5, available: 0, borrowed: 4, min: 1, max: 5 } +0ms
  rascal:Vhost Created confirm channel: 1e7de314-603c-445d-9438-cddb73137fb2 from connection: 8ac5688c-fa04-4fb2-8feb-73b59a78444a +0ms
  rascal:Vhost Releasing confirm channel: 363e196f-2ce9-4fae-8fcb-b7fc22b80ad1. { vhost: '/', mode: 'confirm', queue: 31, size: 5, available: 0, borrowed: 4, min: 1, max: 5 } +0ms
2022-05-28T04:52:16.035Z: hello world
  rascal:Vhost Borrowed confirm channel: 363e196f-2ce9-4fae-8fcb-b7fc22b80ad1. { vhost: '/', mode: 'confirm', queue: 31, size: 5, available: 1, borrowed: 4, min: 1, max: 5 } +1ms
  rascal:Vhost Releasing confirm channel: 08f16873-2f4a-41a3-b3f4-7c570f511e02. { vhost: '/', mode: 'confirm', queue: 30, size: 5, available: 0, borrowed: 5, min: 1, max: 5 } +0ms
2022-05-28T04:52:17.036Z: hello world
  rascal:Vhost Borrowed confirm channel: 1e7de314-603c-445d-9438-cddb73137fb2. { vhost: '/', mode: 'confirm', queue: 30, size: 5, available: 1, borrowed: 4, min: 1, max: 5 } +0ms
  rascal:Vhost Releasing confirm channel: 4ff97b23-2cbb-4c9e-b483-d37496c448a0. { vhost: '/', mode: 'confirm', queue: 29, size: 5, available: 0, borrowed: 5, min: 1, max: 5 } +1ms
2022-05-28T04:52:18.037Z: hello world
  rascal:Vhost Borrowed confirm channel: 08f16873-2f4a-41a3-b3f4-7c570f511e02. { vhost: '/', mode: 'confirm', queue: 29, size: 5, available: 1, borrowed: 4, min: 1, max: 5 } +0ms
  rascal:Vhost Releasing confirm channel: 219e90b1-b4dd-47a4-98f7-051f61982041. { vhost: '/', mode: 'confirm', queue: 28, size: 5, available: 0, borrowed: 5, min: 1, max: 5 } +0ms
2022-05-28T04:52:19.037Z: hello world
  rascal:Vhost Releasing confirm channel: 363e196f-2ce9-4fae-8fcb-b7fc22b80ad1. { vhost: '/', mode: 'confirm', queue: 28, size: 5, available: 1, borrowed: 4, min: 1, max: 5 } +1ms
2022-05-28T04:52:20.037Z: hello world
  rascal:Vhost Borrowed confirm channel: 4ff97b23-2cbb-4c9e-b483-d37496c448a0. { vhost: '/', mode: 'confirm', queue: 28, size: 5, available: 2, borrowed: 3, min: 1, max: 5 } +0ms
  rascal:Vhost Releasing confirm channel: 1e7de314-603c-445d-9438-cddb73137fb2. { vhost: '/', mode: 'confirm', queue: 27, size: 5, available: 1, borrowed: 4, min: 1, max: 5 } +0ms
2022-05-28T04:52:21.037Z: hello world
  rascal:Vhost Borrowed confirm channel: 219e90b1-b4dd-47a4-98f7-051f61982041. { vhost: '/', mode: 'confirm', queue: 27, size: 5, available: 2, borrowed: 3, min: 1, max: 5 } +1ms
  rascal:Vhost Releasing confirm channel: 08f16873-2f4a-41a3-b3f4-7c570f511e02. { vhost: '/', mode: 'confirm', queue: 26, size: 5, available: 1, borrowed: 4, min: 1, max: 5 } +0ms
  rascal:Vhost Borrowed confirm channel: 363e196f-2ce9-4fae-8fcb-b7fc22b80ad1. { vhost: '/', mode: 'confirm', queue: 26, size: 5, available: 2, borrowed: 3, min: 1, max: 5 } +0ms
2022-05-28T04:52:22.039Z: hello world
  rascal:Vhost Borrowed confirm channel: 1e7de314-603c-445d-9438-cddb73137fb2. { vhost: '/', mode: 'confirm', queue: 25, size: 5, available: 1, borrowed: 4, min: 1, max: 5 } +1ms
  rascal:Vhost Releasing confirm channel: 4ff97b23-2cbb-4c9e-b483-d37496c448a0. { vhost: '/', mode: 'confirm', queue: 24, size: 5, available: 0, borrowed: 5, min: 1, max: 5 } +0ms
2022-05-28T04:52:23.040Z: hello world
  rascal:Vhost Borrowed confirm channel: 08f16873-2f4a-41a3-b3f4-7c570f511e02. { vhost: '/', mode: 'confirm', queue: 24, size: 5, available: 1, borrowed: 4, min: 1, max: 5 } +0ms
  rascal:Vhost Releasing confirm channel: 219e90b1-b4dd-47a4-98f7-051f61982041. { vhost: '/', mode: 'confirm', queue: 23, size: 5, available: 0, borrowed: 5, min: 1, max: 5 } +1ms
2022-05-28T04:52:24.041Z: hello world
  rascal:Vhost Releasing confirm channel: 363e196f-2ce9-4fae-8fcb-b7fc22b80ad1. { vhost: '/', mode: 'confirm', queue: 23, size: 5, available: 1, borrowed: 4, min: 1, max: 5 } +1ms
2022-05-28T04:52:25.041Z: hello world
  rascal:Vhost Borrowed confirm channel: 4ff97b23-2cbb-4c9e-b483-d37496c448a0. { vhost: '/', mode: 'confirm', queue: 23, size: 5, available: 2, borrowed: 3, min: 1, max: 5 } +0ms
  rascal:Vhost Releasing confirm channel: 1e7de314-603c-445d-9438-cddb73137fb2. { vhost: '/', mode: 'confirm', queue: 22, size: 5, available: 1, borrowed: 4, min: 1, max: 5 } +0ms
2022-05-28T04:52:26.042Z: hello world
  rascal:Vhost Borrowed confirm channel: 219e90b1-b4dd-47a4-98f7-051f61982041. { vhost: '/', mode: 'confirm', queue: 22, size: 5, available: 2, borrowed: 3, min: 1, max: 5 } +1ms
  rascal:Vhost Borrowed confirm channel: 363e196f-2ce9-4fae-8fcb-b7fc22b80ad1. { vhost: '/', mode: 'confirm', queue: 21, size: 5, available: 1, borrowed: 4, min: 1, max: 5 } +0ms
  rascal:Vhost Releasing confirm channel: 08f16873-2f4a-41a3-b3f4-7c570f511e02. { vhost: '/', mode: 'confirm', queue: 20, size: 5, available: 0, borrowed: 5, min: 1, max: 5 } +0ms
2022-05-28T04:52:27.043Z: hello world
  rascal:Vhost Borrowed confirm channel: 1e7de314-603c-445d-9438-cddb73137fb2. { vhost: '/', mode: 'confirm', queue: 20, size: 5, available: 1, borrowed: 4, min: 1, max: 5 } +1ms
2022-05-28T04:52:28.045Z: hello world
  rascal:Vhost Releasing confirm channel: 4ff97b23-2cbb-4c9e-b483-d37496c448a0. { vhost: '/', mode: 'confirm', queue: 19, size: 5, available: 0, borrowed: 5, min: 1, max: 5 } +1ms
  rascal:Vhost Borrowed confirm channel: 08f16873-2f4a-41a3-b3f4-7c570f511e02. { vhost: '/', mode: 'confirm', queue: 19, size: 5, available: 1, borrowed: 4, min: 1, max: 5 } +0ms
  rascal:Vhost Releasing confirm channel: 219e90b1-b4dd-47a4-98f7-051f61982041. { vhost: '/', mode: 'confirm', queue: 18, size: 5, available: 0, borrowed: 5, min: 1, max: 5 } +0ms
2022-05-28T04:52:29.050Z: hello world
  rascal:Vhost Borrowed confirm channel: 4ff97b23-2cbb-4c9e-b483-d37496c448a0. { vhost: '/', mode: 'confirm', queue: 18, size: 5, available: 1, borrowed: 4, min: 1, max: 5 } +1ms
  rascal:Vhost Releasing confirm channel: 363e196f-2ce9-4fae-8fcb-b7fc22b80ad1. { vhost: '/', mode: 'confirm', queue: 17, size: 5, available: 0, borrowed: 5, min: 1, max: 5 } +0ms
2022-05-28T04:52:30.052Z: hello world
  rascal:Vhost Borrowed confirm channel: 219e90b1-b4dd-47a4-98f7-051f61982041. { vhost: '/', mode: 'confirm', queue: 17, size: 5, available: 1, borrowed: 4, min: 1, max: 5 } +1ms
  rascal:Vhost Releasing confirm channel: 1e7de314-603c-445d-9438-cddb73137fb2. { vhost: '/', mode: 'confirm', queue: 16, size: 5, available: 0, borrowed: 5, min: 1, max: 5 } +0ms
2022-05-28T04:52:31.053Z: hello world
  rascal:Vhost Borrowed confirm channel: 363e196f-2ce9-4fae-8fcb-b7fc22b80ad1. { vhost: '/', mode: 'confirm', queue: 16, size: 5, available: 1, borrowed: 4, min: 1, max: 5 } +0ms
  rascal:Vhost Releasing confirm channel: 08f16873-2f4a-41a3-b3f4-7c570f511e02. { vhost: '/', mode: 'confirm', queue: 15, size: 5, available: 0, borrowed: 5, min: 1, max: 5 } +1ms
2022-05-28T04:52:32.058Z: hello world
  rascal:Vhost Borrowed confirm channel: 1e7de314-603c-445d-9438-cddb73137fb2. { vhost: '/', mode: 'confirm', queue: 15, size: 5, available: 1, borrowed: 4, min: 1, max: 5 } +0ms
  rascal:Vhost Releasing confirm channel: 4ff97b23-2cbb-4c9e-b483-d37496c448a0. { vhost: '/', mode: 'confirm', queue: 14, size: 5, available: 0, borrowed: 5, min: 1, max: 5 } +0ms
2022-05-28T04:52:33.060Z: hello world
  rascal:Vhost Borrowed confirm channel: 08f16873-2f4a-41a3-b3f4-7c570f511e02. { vhost: '/', mode: 'confirm', queue: 14, size: 5, available: 1, borrowed: 4, min: 1, max: 5 } +1ms
  rascal:Vhost Releasing confirm channel: 219e90b1-b4dd-47a4-98f7-051f61982041. { vhost: '/', mode: 'confirm', queue: 13, size: 5, available: 0, borrowed: 5, min: 1, max: 5 } +0ms
2022-05-28T04:52:34.064Z: hello world
  rascal:Vhost Borrowed confirm channel: 4ff97b23-2cbb-4c9e-b483-d37496c448a0. { vhost: '/', mode: 'confirm', queue: 13, size: 5, available: 1, borrowed: 4, min: 1, max: 5 } +0ms
  rascal:Vhost Releasing confirm channel: 363e196f-2ce9-4fae-8fcb-b7fc22b80ad1. { vhost: '/', mode: 'confirm', queue: 12, size: 5, available: 0, borrowed: 5, min: 1, max: 5 } +1ms
2022-05-28T04:52:35.067Z: hello world
  rascal:Vhost Borrowed confirm channel: 219e90b1-b4dd-47a4-98f7-051f61982041. { vhost: '/', mode: 'confirm', queue: 12, size: 5, available: 1, borrowed: 4, min: 1, max: 5 } +0ms
2022-05-28T04:52:36.069Z: hello world
  rascal:Vhost Releasing confirm channel: 1e7de314-603c-445d-9438-cddb73137fb2. { vhost: '/', mode: 'confirm', queue: 11, size: 5, available: 0, borrowed: 5, min: 1, max: 5 } +1ms
  rascal:Vhost Borrowed confirm channel: 363e196f-2ce9-4fae-8fcb-b7fc22b80ad1. { vhost: '/', mode: 'confirm', queue: 11, size: 5, available: 1, borrowed: 4, min: 1, max: 5 } +0ms
2022-05-28T04:52:37.070Z: hello world
cressie176 commented 2 years ago

If you haven't done so already, it's worth upgrading to rascal@14.4.3 which fixed the setTimeout bug I previously mentioned

thaoula commented 2 years ago

Hi @cressie176 ,

We plan to upgrade Rascal this week. We had to modify the helm chart to allow setting the debug flag.

With regards to the debug messages, is there any event we can handle to capture them so we can store them somewhere for review?

Do you see large message payloads potentially causing this issue ... 10-15mb? We have some jobs that are arriving via remote systems and they can contain some large payloads?

This issue also occurred on a different tenant last week.

Regards, Tarek

cressie176 commented 2 years ago

With regards to the debug messages, is there any event we can handle to capture them so we can store them somewhere for review?

No, unfortunately debug logs to stderr

Do you see large message payloads potentially causing this issue ... 10-15mb? We have some jobs that are arriving via remote systems and they can contain some large payloads?

One potential problem is if the large payload is handled in synchronous code. I once worked on a system that had an ~80MB xml payload and the xml parser we used was synchronous. As a result node spent several seconds processing the document, which meant that the kubernetes liveness endpoint timed out and kubernetes concluded the pod was dead and restarted it.

You could get a similar type of problem with heartbeats - if because your application is synchronously processing a large message, your application fails to send a heartbeat for several seconds, the broker may decide the connection has been lost, and close it. Rascal should automatically recover from this however.

cressie176 commented 2 years ago

Interestingly I just tried rascal's simple example with a 20mb file, and after 5-6 messages, do receive ResourceRequest time outs. I'll investigate

cressie176 commented 2 years ago

It looks like that when publishing a large message, amqplib signals that the channel is saturated. When this happens Rascal will defer returning the channel to the channel pool until amqplib emits a "drain" event. However the drain even is never forthcoming. Consequently the channel will never be returned.

If your pool has a maximum of 5 channels, the application will work normally until this happens 5 times, after which the pool will be exhausted, and block permanently.

I'll try to work out why amqplib never emits a drain event.

cressie176 commented 2 years ago

It actually looks like the drain event is emitted before the publish operation yields. Rascal only attaches the event handler after this, so is missing the drain event.

cressie176 commented 2 years ago

I think I have a fix. I'll test and assuming all good, release a new version of rascal today

cressie176 commented 2 years ago

Published in rascal@14.4.4

Please let me know when you've had a chance to update. I'll keep the issue open for a couple of weeks after that, but hopefully it will fix things.

thaoula commented 2 years ago

@cressie176 ,

Thanks so much for the fix and your debug work.

Just letting you know that we deployed 14.4.4 to production yesterday for tenants experiencing this issue.

We will keep you posted.

Regards, Tarek

cressie176 commented 2 years ago

My pleasure & thanks for letting me know. I also appreciate your patience and help diagnosing the problem.

cressie176 commented 2 years ago

@thaoula, assuming there have been no more issues do you think we've waited long enough to close?

thaoula commented 2 years ago

@cressie176 at the moment it has not reoccured. So let's close this issue. If it happens again we can re-open.

Once again, I really appreciate your assistance with this issue.

cressie176 commented 2 years ago

Thanks for getting back to me @thaoula