rebus-org / Rebus.RabbitMq

:bus: RabbitMQ transport for Rebus
https://mookid.dk/category/rebus
Other
62 stars 44 forks source link

ChannelClosedException when receiving next message due to DNS error #99

Closed Hugzy closed 1 year ago

Hugzy commented 1 year ago

I've started to encounter an issue where Rebus fails to receive the next message from a queue because the channel has been closed.

My setup is such that I have a webserver with RabbitMQ running on it as well, and a batch processing software that runs on a separate server that consumes messages from RabbitMQ and processes the requests accordingly. But, sometimes it will completely stop dequeuing messages and a restart is needed in order to get the service to receive messages again.

The Errormessage:

An error occurred when attempting to receive the next message: Rebus.Exceptions.RebusApplicationException: Unexpected exception 
thrown while trying to dequeue a message from rabbitmq, queue address: DigiBatch ---> 
System.Threading.Channels.ChannelClosedException: The channel has been closed. at 
Rebus.RabbitMq.RabbitMqTransport.Receive(ITransactionContext context, CancellationToken cancellationToken) --- End of inner 
exception stack trace --- at Rebus.RabbitMq.RabbitMqTransport.Receive(ITransactionContext context, CancellationToken 
cancellationToken) at Rebus.Workers.ThreadPoolBased.ThreadPoolWorker.ReceiveTransportMessage(CancellationToken token, 
ITransactionContext context)

I suspect it has something to do with the fact that we are seeing a DNS timeout in the windows system logs just before this exception happens in our own logs. (The servers run in azure and communicate through an azure DNS) image

I've tracked the exception to this particular line of code in the rebus codebase https://github.com/rebus-org/Rebus.RabbitMq/blob/44363284c11b97f63b89c4f2d928db9593275008/Rebus.RabbitMq/RabbitMq/RabbitMqTransport.cs#L523 If the DNS times out but comes back on after a while, shouldn't rebus be able to reestablish the connection and continue to consume messages, or is there something that needs tweaking in this case?

mookid8000 commented 1 year ago

If the DNS times out but comes back on after a while, shouldn't rebus be able to reestablish the connection and continue to consume messages, or is there something that needs tweaking in this case?

I don't know, actually. Rebus doesn't really do anything with the RabbitMQ connection strings besides passing them to the RabbitMQ driver's connection factory, setting AutomaticRecoveryEnabled=true. If that's not enough, then I don't know what else Rebus could do to "survive" DNS timeouts... could it be a case of DNS-to-IP mappings being cached for a while or something like that?

mookid8000 commented 1 year ago

Hi @Hugzy , I'll close this one for now assuming you fixed your issue. Let me know if that isn't the case