Open DriverX opened 5 years ago
Duplicate of #202.
I update issue with huge reproduce steps.
Duplicate of #202.
This is not duplicate
I tried the reproduction code with aiormq==2.7.2
recommended https://github.com/mosquito/aio-pika/issues/112#issuecomment-519597738 here. And it now fails with another exception.
Consumer:
Traceback (most recent call last):
File "/home/artimi/.virtualenvs/rabbit-messaging/lib/python3.6/site-packages/aiormq/base.py", line 25, in __inner
return await self.task
concurrent.futures._base.CancelledError
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "example/messaging_consumer.py", line 50, in <module>
asyncio.get_event_loop().run_until_complete(main())
File "/usr/lib/python3.6/asyncio/base_events.py", line 484, in run_until_complete
return future.result()
File "example/messaging_consumer.py", line 39, in main
async for message in consumer:
File "/home/artimi/wood/rabbit-messaging/rabbit_messaging/consumer.py", line 193, in __anext__
return await self.get()
File "/home/artimi/wood/rabbit-messaging/rabbit_messaging/consumer.py", line 128, in get
message = await self._get()
File "/home/artimi/wood/rabbit-messaging/rabbit_messaging/consumer.py", line 118, in _get
message = await asyncio.shield(self._get_future)
File "/home/artimi/.virtualenvs/rabbit-messaging/lib/python3.6/site-packages/aiormq/channel.py", line 298, in basic_get
await self.rpc(spec.Basic.Get(queue=queue, no_ack=no_ack))
File "/home/artimi/.virtualenvs/rabbit-messaging/lib/python3.6/site-packages/aiormq/base.py", line 171, in wrap
return await self.create_task(func(self, *args, **kwargs))
File "/home/artimi/.virtualenvs/rabbit-messaging/lib/python3.6/site-packages/aiormq/base.py", line 27, in __inner
raise self.exception from e
File "/usr/lib/python3.6/asyncio/tasks.py", line 537, in _wrap_awaitable
return (yield from awaitable.__await__())
File "/home/artimi/.virtualenvs/rabbit-messaging/lib/python3.6/site-packages/aiormq/base.py", line 27, in __inner
raise self.exception from e
File "/usr/lib/python3.6/asyncio/tasks.py", line 537, in _wrap_awaitable
return (yield from awaitable.__await__())
File "/home/artimi/.virtualenvs/rabbit-messaging/lib/python3.6/site-packages/aiormq/base.py", line 27, in __inner
raise self.exception from e
aiormq.exceptions.ConnectionClosed: (320, "CONNECTION_FORCED - broker forced connection closure with reason 'shutdown'")
Producer:
Traceback (most recent call last):
File "/home/sebekpet/.virtualenvs/rabbit-messaging/lib/python3.6/site-packages/aiormq/base.py", line 25, in __inner
return await self.task
concurrent.futures._base.CancelledError
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "example/messaging_producer.py", line 48, in <module>
asyncio.get_event_loop().run_until_complete(main())
File "/usr/lib/python3.6/asyncio/base_events.py", line 484, in run_until_complete
return future.result()
File "/usr/lib/python3.6/asyncio/tasks.py", line 537, in _wrap_awaitable
return (yield from awaitable.__await__())
File "/home/sebekpet/.virtualenvs/rabbit-messaging/lib/python3.6/site-packages/aiormq/base.py", line 27, in __inner
raise self.exception from e
File "example/messaging_producer.py", line 35, in main
await producer.publish('common.TestSchema.abc', data)
File "/home/sebekpet/wood/rabbit-messaging/rabbit_messaging/producer.py", line 117, in publish
await publish_coroutine
File "/home/sebekpet/.virtualenvs/rabbit-messaging/lib/python3.6/site-packages/aio_pika/exchange.py", line 202, in publish
loop=self.loop, timeout=timeout
File "/usr/lib/python3.6/asyncio/tasks.py", line 339, in wait_for
return (yield from fut)
File "/home/sebekpet/.virtualenvs/rabbit-messaging/lib/python3.6/site-packages/aiormq/channel.py", line 439, in basic_publish
return await confirmation
aiormq.exceptions.ConnectionClosed: (320, "CONNECTION_FORCED - broker forced connection closure with reason 'shutdown'")
I see that you added help wanted
tag to this @mosquito. How could I help with this?
In version aio-pika==6.1.3
I am able to achieve robust connection by catching connection exception raised from publish
and using long-lived consumers (called with consume
not with get
). I tested it with RabbitMQ which is connected at the start and gets restarted during producing/consuming. This is how my wrapped publish method looks now:
async def _publish(
self,
schema: str,
message: aio_pika.Message,
routing_key: str
) -> Optional[aiormq.types.ConfirmationFrameType]:
'''
Publish coroutine in a special function so we can catch and ignore connection exceptions
'''
iterator = itertools.count() if self._retry_count is None else range(self._retry_count)
for attempt in iterator:
try:
return await self._exchanges[schema].publish(message, routing_key)
except aio_pika.exceptions.CONNECTION_EXCEPTIONS as exc:
if attempt + 1 == self._retry_count:
self._logger.error(
'Message with schema = {}, routing key = {}, data = {} could not be send after = {} retries. Skipping.',
schema,
routing_key,
message.body,
self._retry_count
)
raise
else:
self._logger.warning('Connection closed while publishing with exception = {}. Will retry.', exc)
await asyncio.sleep(self.RETRY_WAIT)
return None
So. Today I try to reproduce it. So, the consumer successfully connected after the rabbitmq container restart. Is it still actual?
BTW: running the rabbitmq in docker image it is a bad idea. Docker container might changing internal IP address, please use port forwarding or TCP proxy for representing this behaviour.
@mosquito yes it is
So. Today I try to reproduce it. So, the consumer successfully connected after the rabbitmq container restart. Is it still actual?
BTW: running the rabbitmq in docker image it is a bad idea. Docker container might changing internal IP address, please use port forwarding or TCP proxy for representing this behaviour.
In aio-pika>=7 reconnect completely fixed (RobustConnection) as I can see.
This is still happening in v9.4.0 @mosquito
Can be reproduce with https://aio-pika.readthedocs.io/en/latest/patterns.html
@gaby what exactly happening?
@gaby what exactly happening?
With using connect_robust()
if the connection to RabbitMQ
is lost (not closed) the logs show the consumer trying to reconnect every 5 secs
. Once RabbitMQ
becomes reachable again, the logs stop and no new messages show up (Which I would assume means it reconnected). Instead the consumer no longer consumes from RabbitMQ
. In the Mgmt UI, RabbitMQ
has 0 consumer for that queue.
The log message I see during reconnect is this line (it stops when RabbitMQ
comes back online, but consumer no longer gets messages): https://github.com/mosquito/aio-pika/blob/master/aio_pika/robust_connection.py#L83
It's the same behavior posted in https://github.com/mosquito/aio-pika/issues/231
I'm using the aio-pika 9.4.0
with Python 3.10
@gaby How can I reproduce the “connection lost” part?
Please try 9.4.1 (which includes #622) which removed a deadlock. The zero consumer for the queue is in line with you running into that deadlock.
@mosquito @Darsstar Issue still happens in v9.4.1
I can post a full example in 1-2 hrs, but basically just restarting the RabbitMQ container causes this. It's a connection lost event since it's not gracefully shutdown.
@mosquito @Darsstar
worker.py
import asyncio
from aio_pika import connect_robust
from aio_pika.patterns import Master, NackMessage, RejectMessage
async def worker(*, task_id: int) -> None:
if task_id % 2 == 0:
raise RejectMessage(requeue=False)
if task_id % 2 == 1:
raise NackMessage(requeue=False)
print(task_id)
async def main() -> None:
connection = await connect_robust(
"amqp://guest:guest@127.0.0.1/?name=aio-pika%20worker",
)
# Creating channel
channel = await connection.channel()
# Initializing Master with channel
master = Master(channel)
await master.create_worker("my_task_name", worker, auto_delete=False, durable=True)
try:
await asyncio.Future()
finally:
await connection.close()
if __name__ == "__main__":
asyncio.run(main())
RabbitMQ:
mkdir data
docker run -d --hostname rabbit --name rabbit -p 5672:5672 -p 15672:15672 -v $PWD/data:/var/lib/rabbitmq rabbitmq:3-management
Login into the UI, and click on Queues and Streams
-> my_task_name
, you will see 1 consumer
. Now run this:
docker kill rabbit; docker start rabbit; docker logs -f rabbit
You will see the logs in the worker stop printing this:
Connection attempt to "amqp://guest:******@127.0.0.1/?name=aio-pika%20worker" failed: Server connection reset: ConnectionResetError(104, 'Connection reset by peer'). Reconnecting after 5 seconds.
You will also see the following in the RabbitMQ logs:
2024-03-27 01:49:40.521168+00:00 [info] <0.693.0> accepting AMQP connection <0.693.0> (172.17.0.1:42788 -> 172.17.0.2:5672)
2024-03-27 01:49:40.522729+00:00 [info] <0.693.0> connection <0.693.0> (172.17.0.1:42788 -> 172.17.0.2:5672) has a client-provided name: aio-pika worker
2024-03-27 01:49:40.523354+00:00 [info] <0.693.0> connection <0.693.0> (172.17.0.1:42788 -> 172.17.0.2:5672 - aio-pika worker): user 'guest' authenticated and granted access to vhost '/'
But if you know click on Queues and Streams
-> my_task_name
, there's 0 consumers
.
Note, i'm using auto_delete=False, durable=True
, so the Queue gets recreated again else I don't get any queue at all since the worker is not creating that queue again either.
Edit:
Here are the logs when running in Debug Mode and I trigger "docker kill/start/logs":
DEBUG:aiormq.connection:Reader exited for <Connection: "amqp://guest:******@127.0.0.1:5672/?name=aio-pika%20worker" at 0x704c0d04fce0>
DEBUG:aiormq.connection:Cancelling cause reader exited abnormally
DEBUG:aiormq.connection:Sending <Connection.Close object at 0x704c0d086e80> to <Connection: "amqp://guest:******@127.0.0.1:5672/?name=aio-pika%20worker" at 0x704c0d04fce0>
DEBUG:aiormq.connection:Writer on connection amqp://guest:******@127.0.0.1:5672/?name=aio-pika%20worker closed
DEBUG:aiormq.connection:Writer exited for <Connection: "amqp://guest:******@127.0.0.1:5672/?name=aio-pika%20worker" at 0x704c0d04fce0>
DEBUG:aiormq.connection:Closing connection <Connection: "amqp://guest:******@127.0.0.1:5672/?name=aio-pika%20worker" at 0x704c0d04fce0> cause: <AMQPConnectionError: ('Server connection unexpectedly closed. Read 0 bytes but 1 bytes expected',)>
DEBUG:aio_pika.channel:Start reopening channel <aio_pika.robust_channel.RobustChannel object at 0x704c0d075ea0>
INFO:aio_pika.robust_connection:Connection to amqp://guest:******@127.0.0.1/?name=aio-pika%20worker closed. Reconnecting after 5 seconds.
DEBUG:aio_pika.robust_connection:Connection attempt for <RobustConnection: "amqp://guest:******@127.0.0.1/?name=aio-pika%20worker" 1 channels>
DEBUG:aiormq.connection:Connecting to: amqp://guest:******@127.0.0.1:5672/?name=aio-pika%20worker
INFO:aio_pika.robust_connection:Connection to amqp://guest:******@127.0.0.1/?name=aio-pika%20worker closed. Reconnecting after 5 seconds.
WARNING:aio_pika.robust_connection:Connection attempt to "amqp://guest:******@127.0.0.1/?name=aio-pika%20worker" failed: Server connection reset: ConnectionResetError(104, 'Connection reset by peer'). Reconnecting after 5 seconds.
DEBUG:aio_pika.robust_connection:Waiting for connection close event for <RobustConnection: "amqp://guest:******@127.0.0.1/?name=aio-pika%20worker" 1 channels>
DEBUG:aio_pika.robust_connection:Connection attempt for <RobustConnection: "amqp://guest:******@127.0.0.1/?name=aio-pika%20worker" 1 channels>
DEBUG:aiormq.connection:Connecting to: amqp://guest:******@127.0.0.1:5672/?name=aio-pika%20worker
INFO:aio_pika.robust_connection:Connection to amqp://guest:******@127.0.0.1/?name=aio-pika%20worker closed. Reconnecting after 5 seconds.
WARNING:aio_pika.robust_connection:Connection attempt to "amqp://guest:******@127.0.0.1/?name=aio-pika%20worker" failed: Server connection reset: ConnectionResetError(104, 'Connection reset by peer'). Reconnecting after 5 seconds.
DEBUG:aio_pika.robust_connection:Waiting for connection close event for <RobustConnection: "amqp://guest:******@127.0.0.1/?name=aio-pika%20worker" 1 channels>
DEBUG:aio_pika.robust_connection:Connection attempt for <RobustConnection: "amqp://guest:******@127.0.0.1/?name=aio-pika%20worker" 1 channels>
DEBUG:aiormq.connection:Connecting to: amqp://guest:******@127.0.0.1:5672/?name=aio-pika%20worker
DEBUG:aio_pika.channel:Start reopening channel <aio_pika.robust_channel.RobustChannel object at 0x704c0d075ea0>
DEBUG:aiormq.connection:Prepare to send ChannelFrame(payload=b'\x01\x00\x01\x00\x00\x00\x06\x00\x14\x00\n\x010\xce', should_close=False, drain_future=None)
DEBUG:aiormq.connection:Received frame <Channel.OpenOk object at 0x704c0d0ac400> in channel #1 weight=16 on <Connection: "amqp://guest:******@127.0.0.1:5672/?name=aio-pika%20worker" at 0x704c0d087600>
DEBUG:aiormq.connection:Prepare to send ChannelFrame(payload=b'\x01\x00\x01\x00\x00\x00\x05\x00U\x00\n\x00\xce', should_close=False, drain_future=None)
DEBUG:aiormq.connection:Received frame <Confirm.SelectOk object at 0x704c0d0bc790> in channel #1 weight=12 on <Connection: "amqp://guest:******@127.0.0.1:5672/?name=aio-pika%20worker" at 0x704c0d087600>
DEBUG:aiormq.connection:Prepare to send ChannelFrame(payload=b'\x01\x00\x01\x00\x00\x00\x0b\x00<\x00\n\x00\x00\x00\x00\x00\x00\x00\xce', should_close=False, drain_future=None)
DEBUG:aiormq.connection:Received frame <Basic.QosOk object at 0x704c0d0bc7f0> in channel #1 weight=12 on <Connection: "amqp://guest:******@127.0.0.1:5672/?name=aio-pika%20worker" at 0x704c0d087600>
DEBUG:aio_pika.robust_connection:Connection made on <RobustConnection: "amqp://guest:******@127.0.0.1/?name=aio-pika%20worker" 1 channels>
DEBUG:aio_pika.robust_connection:Waiting for connection close event for <RobustConnection: "amqp://guest:******@127.0.0.1/?name=aio-pika%20worker" 1 channels>
There's no logs after that. Seems the connection and channel both get re-established, but the aio_pika.queue:Declare Queue
and aio_pika.queue: Start to consuming queue
never happen.
@mosquito @Darsstar What I'm getting from this issues is the connection and channel are both re-created and established. The queue is not redeclared and the consumer is not re-created.
Is there a way in aio-pika
to add a callback
to connect_robust()
to do those actions until a fix is in place?
There is a slight posibility that what comes next is only true in combination with #615, since I repoduced your issue on that branch.
You should store the result of master.create_worker()
since WeakSet
s are involed. The Worker
instance has a strong reference to the RobustQueue
which will keep the WeakRef
the RobustConnection
holds alive.
With the changes in this snippet it started working.
client = await master.create_worker("my_task_name", worker, auto_delete=False, durable=True)
try:
await asyncio.Future()
finally:
await client.close()
await connection.close()
@Darsstar That fixed the issue! It now redeclares the queue and starts consuming again.
The documentation needs to be updated if the WeakRef are going to be involved.
Try yourself.
Reconnect logic in RobustConnection since aio-pika >= 5 is completely broken. (In aio-pika 4 broken too, but not so dramatically).
Start rabbitmq docker
Create
test_queue
in management rabbitmq UIStart consumer in docker
Start publisher in docker
Restart rabbitmq container
As result exited consumer and publisher processes
Consumer log
Publisher log