noodlefrenzy / node-amqp10

amqp10 is a promise-based, AMQP 1.0 compliant node.js client
MIT License
134 stars 56 forks source link

How to handle detached link state after re-connection #291

Open wingedfox opened 7 years ago

wingedfox commented 7 years ago

Hi,

I've met the following issue

2017-01-24T09:35:57.042Z - [mq-publisher] message has been sent

2017-01-24T10:28:26.568Z - [amqpConnection connection:closed
2017-01-24T10:28:26.500Z - [amqpConnection] connection:errorReceived AmqpProtocolError: amqp:resource-limit-exceeded:local-idle-timeout expired
2017-01-24T10:28:32.599Z - [amqpConnection] connection:disconnected

2017-01-24T10:29:22.727Z - [amqpSenderLink] detached event
2017-01-24T10:29:27.217Z - [amqpConnection] connection:opened

2017-01-24T10:29:30.616Z - [amqpSenderLink] message sent
2017-01-24T10:29:30.951Z - [Client] successful finish

2017-01-24T10:29:30.995Z - [mq-publisher] message has been sent

2017-01-25T06:05:00.876Z - [amqpConnection] connection:errorReceived AmqpProtocolError: amqp:resource-limit-exceeded:local-idle-timeout expired

The issue is that all messages sent between 'detached' and 'connection:errorReceived' events were reported as sent (promise resolved), but they did not came to a server.

The questing is: should I handle re-attach from the client code or library has a feature to reattach link? I use AMQP.Policy.ActiveMQ without any tweaks.

amqp10 version 3.2.5

mbroadst commented 7 years ago

@wingedfox the client should be able to handle re-attach events, could you possibly isolate the exact order of events that lead to this? We should be able to recreate your situation within the unit tests by following the remote frame order

wingedfox commented 7 years ago

@mbroadst I've updated the log in the first message. The very first message in the log is the last successful send, the one before the last is 1st lost message. All messages in between - ampq10 event handlers.

I'm not sure that can I add to log, because I haven't met the same issue. Our investigation shows that at the same time there was about 10-minutes long freeze of the VM, where client resides.

I could not reproduce this issue with another set of VMs, so I do not have amqp10 debug output.

As a short-term solution, should I check 'attached' state of sender link and re-attach it manually?

mbroadst commented 7 years ago

@wingedfox apologies. I'm traveling this week, I will update the ticket when I'm back.

mbroadst commented 7 years ago

@wingedfox the link should reattach automatically, but with an exponential backoff which might explain the wait times. I would not recommend reattaching it manually. Are you using the link cache plugin by any chance?

wingedfox commented 7 years ago

@mbroadst, the problem is that there were no link reattach for more than 12 hours and I was not able to track the issue because amqp10 told me 'everything is ok' by return of Promise.resolve but actually sent nothing.

Our dig to the issue shows that there were a VM suspend for several minutes. Probably client met a kind of window shift issue when the message was actually sent to an amqp server (ActiveMQ) but was not accepted/recorded in a queue for some reason?

wingedfox commented 7 years ago

Any ideas on fix or reproduction of this issue?