mosquito / aio-pika

AMQP 0.9 client designed for asyncio and humans.
https://aio-pika.readthedocs.org/
Apache License 2.0
1.18k stars 186 forks source link

Issues with HA RabbitMQ cluster #567

Closed awoimbee closed 11 months ago

awoimbee commented 11 months ago

Hi, I tried making a detailed issue but I don't know my way around aio-pika and aiormq enough, so I'll make it simple.

setup I run rabbitmq in a cluster and I use quorum queues. One rabbit instance might become unavailable at any time. I expect RobustConnection to handle reconnects to any rabbit replicas automagically (I'm setup like the example Asynchronous message processing).

Seems like aiormq keeps a persistent connection open with a single rabbit replica. When this replica goes down I get these issues:

  1. An exception is created each time, sending many false alerts my way https://github.com/mosquito/aiormq/blob/f04f0cf7f1972e14abd4fefc90218770f335e239/aiormq/connection.py#L558
  2. Occasional Task was destroyed but it is pending! (#432) seem to happen when reading a message during rmq shutdown (it happens before Unexpected connection close is printed).
    Task was destroyed but it is pending!
    task: <Task pending name='Task-31' coro=<Connection._on_reader_done.<locals>.close_writer_task() running at /usr/local/lib/python3.10/site-packages/aiormq/connection.py:506> wait_for=<Future finished result=None>>
    task: <Task pending name='Task-38' coro=<OneShotCallback.__task_inner() running at /usr/local/lib/python3.10/site-packages/aio_pika/tools.py:230>>
  3. Occasional DeliveryError from exchange.publish(message, MY_ROUTING_KEY), seems to happen in tandem with Task was destroyed but it is pending! (before Unexpected connection close).

Is there a magic bullet to solve all of this ? Otherwise, how can I help solve these issues ?

mosquito commented 11 months ago

@awoimbee if I guess correctly your HA cluater topology and connection endpoint is a DNS records with same name then this article is a possible solution for you.

awoimbee commented 11 months ago

Hi, everything is inside a kubernetes cluster, I have a service (a DNS record) that points to rabbitmq pods. The rabbitmq containers are shutdown gracefully. I don't think happy_eyeballs will help, my issue is not with the reconnect but with the disconnects.