vcabbage / amqp

AMQP 1.0 client library for Go.
https://godoc.org/pack.ag/amqp
MIT License
134 stars 96 forks source link

Sender and Receiver ampq:decode-error ... #187

Open psparago opened 4 years ago

psparago commented 4 years ago

I have two components that use Amazon MQ as their common AMQP provider. Both components are in separate Linux CentOS EC2 instances (however, I was able to reproduce this issue once using Apache ActiveMQ and all components on a my local CentOS VM, i.e. no Amazon MQ).

One component is written in .NET Core using the amqpnetlite client library. The other component is written in Go using vcabbage/amqp (latest). The interaction between these components is done via AMQP queues. Each component listens to their own receiver queue for messages sent by the other component.

Both components work as expected, but after some seemingly random amount of time (for example 11 minutes in my latest test), the Go component will begin logging both receive and send errors. Once this has happened in the Go component, the Go component must be restarted. This error has not happened at all on the .net core component.

The errors I'm seeing look like this (for example on the sending side):

error sending message to:client-base-0, error: *Error{Condition: amqp:decode-error, Description: Could not decode AMQP frame: hex: 0000016a02000000005314d000000013000000045201522fa008000000000000000043005373d00000003e0000000ca12466303961313362612d333031662d343531612d383831642d6364636265393331616163394040a1076f6e652d77617940a100404040404043005375a0fc7b226f223a302c226d223a22636f6e74657874222c2274223a22636f6e74726f6c222c2270223a7b22706d223a2231222c227063223a2231222c227275223a2231222c227365727665725f76657273696f6e223a2239392e302e302e31353733303635303234227d2c227263223a22626173652d30222c22736964223a2272632d736572766572403137322e33312e33362e313736222c2269223a2266303961313362612d333031662d343531612d383831642d636463626539333161616339222c226f736964223a22222c22636964223a22222c227274223a226f6e652d776179222c22726b223a22222c226f77223a747275652c2276223a327d, Info: map[]}

The hex digits digits appear to be identical in every error.

This issue is causing an impediment to a very high priority project, so I would appreciate any assistance. I'm happy to post code etc. if that will help.

vcabbage commented 4 years ago

The error message is being produced by ActiveMQ. There's a good chance ActiveMQ is specifying it while closing the connection. Once that happens pretty much any action on the connections, sessions, and links will return the same error (perhaps the logic should be adjusted so it's clear the error is originating from a broker initiated close).

You should be able to mitigate the impact by recreating the connection from scratch when an error occurs. This is a good idea in general since there's no re-connection logic built in and something like a network interruption would also cause problems.

The hex in the error message appears to be a valid transfer frame.

Header: {Size:362 DataOffset:2 FrameType:0 Channel:0}
Body: Transfer{Handle: 1, DeliveryID: 47, DeliveryTag: "\x00\x00\x00\x00\x00\x00\x00\x00", MessageFormat: 0, Settled: false, More: false, ReceiverSettleMode: <nil>, State: <nil>, Resume: false, Aborted: false, Batchable: false, Payload [size]: 327}

Since the error isn't very specific this may be difficult to track down. I think ActiveMQ prints a stack trace when errors like this happen, that's likely the best bet for determining what about the frame it's having an issue with.

If you'd like to share the Go code I can take a look to see if there's any issue there, but I'm not optimistic it'll lead to a resolution.

psparago commented 4 years ago

Thank you very, very much for the speedy reply. It is very much appreciated!

I have some experience with AMQP 0.9 and ActiveMQ JMS, but I am brand new to AMQP 1.0 so if you wouldn't mind taking a look at my code, I'd be very grateful. There's really not much to it since I'm really using AMQP as just a shared memory provider in an HA environment.

I've attached sanitized code so it is not buildable due to confidentiality requirements. Also I have not made the changes suggested in the response above.

Once again, I am very grateful for your help. Thank you!

amqp1sample.txt

vcabbage commented 4 years ago

You're welcome.

A couple potentially relevant notes:

psparago commented 4 years ago

Once again, thank you very much for your time. I will implement your suggestions.