houseofcat / turbocookedrabbit

A user friendly RabbitMQ library written in Golang.
MIT License
107 stars 20 forks source link

Messages were lost from the AutoPublisher. #4

Closed houseofcat closed 4 years ago

houseofcat commented 4 years ago

Summary

Messages lost with AutoPublisher despite being Retried. This was a production outage in which it seems inexplicably we lost our connectivity and eventually our messages.

Details The fix was that when consumers went down, they retrieved a new AckChannel. When an AckChannel was acquired we then checked did the ConnectionHost already have too many Channels made from it (per our configuration, not per server). If it did, we re-acquired a new Connection. Unfortunately, after a hotfix 4 months ago to prevent Channels getting bunched up on a single connection during a recovery from outage event, I forgot to return the Connection to the ConnectionPool before acquiring another for AckChannels.

You would more than likely survive the first outage, second outage, etc. but eventually you would have a transient outage/catastrophic outage and the ConnectionPool had been accidentally depleted.

During AutoPublish, we get a channel and publish. If that Channel is dead, we acquire a Connection from the ConnectionPool. If we have dehydrated connections, we paused indefinitely (by design), waiting for a Connection that will never come. This occurred in the background on a separate goroutine meaning it was just stuck indefinitely polling for an exit status that never came.

Solution Identify all internal draining scenarios for ConnectionPool usage that may have gone over looked, then patch in a return / flag command on error/error condition.

houseofcat commented 4 years ago

Fixed by commit https://github.com/houseofcat/turbocookedrabbit/commit/4193eb5e0ffdd8c287b779e02718a8ed6e99dd13