We have a number of microservices using messaging for decoupled interactions. We use rabbitMq as a messaging technology and use the RabbitMq NewRelic plugin to report on queue sizes etc. This has been reliable for over a year and is a foundation of our DevOps experience.
We had a fanout issue (from 14:15 03 July 2017) that resulted in 100 Ks of messages being produced. For hours we could see the queue sizes and the rate of consumption.
However, the plugin showed (and indeed still shows) that the queues collapsed from 03 July 16:35 (269k Messages) until 04 JULY - 03:45 when a message appeared again.
We know the messages were there and were processed for many hours from our logs, however, the lack of any monitoring has withered some of our confidence in out DevOps tooling.
Any ideas or guidance as to the cause of the monitoring failure or what we could possibly do to prevent this in the future.
We have a number of microservices using messaging for decoupled interactions. We use rabbitMq as a messaging technology and use the RabbitMq NewRelic plugin to report on queue sizes etc. This has been reliable for over a year and is a foundation of our DevOps experience. We had a fanout issue (from 14:15 03 July 2017) that resulted in 100 Ks of messages being produced. For hours we could see the queue sizes and the rate of consumption. However, the plugin showed (and indeed still shows) that the queues collapsed from 03 July 16:35 (269k Messages) until 04 JULY - 03:45 when a message appeared again. We know the messages were there and were processed for many hours from our logs, however, the lack of any monitoring has withered some of our confidence in out DevOps tooling. Any ideas or guidance as to the cause of the monitoring failure or what we could possibly do to prevent this in the future.
Many thanks ian
From NewRelic ticket