pardahlman / RawRabbit

A modern .NET framework for communication over RabbitMq
MIT License
746 stars 144 forks source link

Consumer Cancel Notification #364

Open RedOnePrime opened 6 years ago

RedOnePrime commented 6 years ago

We have noticed an issue with consumers getting disconnected from queues. We have not been able to narrow it down to what is going on, even with debug logging turned on. However we did find this information: http://www.rabbitmq.com/consumer-cancel.html

It suggests that (as an example) a cancel notification is sent via the model that is raised inside the DefaultBasicConsumer. The EventingBasiConsumer does override some of these events/methods and provide some more implementation, showever the ConsumerCancel event it does not.

We tested this by using one of the examples on the page that could trigger this event, which was deleting the queue the consumer is connected to. We also modified ConsumerFactory to attach to the event and log.

public async Task<IBasicConsumer> CreateConsumerAsync(IModel channel = null, CancellationToken token = default(CancellationToken))
{
    if (channel == null)
    {
        channel = await GetOrCreateChannelAsync(token);
    }

    var asdf = new EventingBasicConsumer(channel);
        asdf.ConsumerCancelled += Consumer_ConsumerCancelled;
        return asdf;
}

private void Consumer_ConsumerCancelled(object sender, ConsumerEventArgs e)
{
    _logger.Error("Log here");
}

When the queue is deleted while the consumer is connected the log statement executes. There are other examples on that page and perhaps some unknowns I do not know about that triggers this event. However in our scenario in production it is not a queue getting deleted but 'something else' we have not been able to identify. The symptoms are that over time, the consumer count drops until it is 0 and then the queue starts to fill up. Once the application is restarted they all re-subscribe and the queue drains.

We would like to try using this event to resubscribe, however we are not sure how best to implement it and where. Any help on a good method of doing so? In our use case we would want it to try to resubscribe forever, and should (I assume) execute the whole subscription pipeline while doing so (so Polly can do its thing and retry, queue/exchange/bindings gets recreated, etc) rather than make assumptions that it can just reconnect without issue and have it work the first time.

Any guidance here would be appreciated. Again, we would like to get something implemented (with some logging) and toss it into the fire and see if this event gets raised from the broker right before the consumer drops. Every other method we have tried (forcefully shutting down the broker, network segments/splits, etc) to simulate through docker simply reconnects every time; which leads us to believe it is something else. After a month of investigation this is our only lead. It's worth a shot at this point.

Side note, we have about 20+ queues between our 15 applications and it seems to happen randomly. For example, one application has two subscriptions with different queue names and different bindings and it also publishes to one exchange. This application (many instances of it in production) has had one of those subscriptions reach a consumer count of 0 on its queue yet still has consumers on the other subscription/queue.

Thanks,

-Red

jmgarrett commented 5 years ago

Is there any update on this @pardahlman ? I am experiencing this issue where the RabbitMQ broker will close consumers, but the RawRabbit library is not reacting to the Cancel Consumer Notification, and so all of my Subscriptions which were reliant on that consumer just sit there idle while the queues they were previously consuming messages from fill due to there being no consumer anymore.

I am going through your library and trying to determine the best way to implement this so that if a subscriptions consumer is cancelled that it can create another one. I was under the impression that this level of resiliency was already provided by RawRabbit, but it appears it isn't.

Do you have any suggestions on how to move forward?