mosquito / aio-pika

AMQP 0.9 client designed for asyncio and humans.
https://aio-pika.readthedocs.org/
Apache License 2.0
1.18k stars 186 forks source link

stall with RuntimeWarning: coroutine 'Connection.connect' was never awaited #192

Open adamhooper opened 5 years ago

adamhooper commented 5 years ago

Steps to reproduce:

  1. Run any example in the documentation
  2. Start RabbitMQ after the example has started running

Expected results: connect_robust() will retry until connected (logging 'Connection Refused' errors for each failed attempt), after which the example will work as expected Actual results:

INFO 2019-02-01 17:41:29,963 aio_pika.pika.adapters.base_connection 139682006103808 Connecting to 192.168.64.3:5672
WARNING 2019-02-01 17:41:29,964 aio_pika.pika.adapters.base_connection 139682006103808 Connection to 192.168.64.3:5672 failed: [Errno 111] Connection refused
WARNING 2019-02-01 17:41:29,969 aio_pika.pika.connection 139681997711104 Could not connect, 99 attempts left
INFO 2019-02-01 17:41:29,970 aio_pika.pika.connection 139681997711104 Retrying in 2 seconds
/usr/local/lib/python3.6/asyncio/events.py:145: RuntimeWarning: coroutine 'Connection.connect' was never awaited
  self._callback(*self._args)

... And the connect_robust() task stalls forever.

Is commit e395b638b25aad66f4e9b6051c169230cd2545f9 to blame, perhaps?

adamhooper commented 5 years ago

Seems to come from #186

dequis commented 5 years ago

Might also be #191

OrangeTux commented 5 years ago

@adamhooper I cannot reproduce the problem you describe, can you show me snippet of what you tried, together with version numbers of Python and aio-pika?

I tried to reproduce the problem using the 'Simple Consumer' and 'Simple Producer' using Python 3.7.0 and aio-pika 4.9.3.

But after I start RMQ, the scripts connect to RMQ without problem.

adamhooper commented 5 years ago

@OrangeTux Sorry -- I probably should have spent more time working on this bug report before I filed it. I was in a hurry.

The code that didn't work was https://github.com/CJWorkbench/channels_rabbitmq/commit/ed7a6fddf81267c690a9953517414f3341026a0e -- run pytest with those versions and you'll find it won't reconnect. I jumped to conclusions about the source of the bug; all I know is that running pytest without RabbitMQ running would stall forever on 4.9.3 (even after starting RabbitMQ); it would reconnect gracefully when pinned to 4.9.1.

I've since moved away from aio_pika. I decided I shouldn't use RobustConnection (because it has changed behavior between minor versions of aio_pika; I didn't find any documents explaining its intended behavior in edge cases; and even if I did, I doubt its intended behavior matches our needs exactly); and I learned aio_pika uses a modified version of pika instead of the official upstream one. I flirted with the idea of using aiormq, then I settled on aioamqp because it's older and its warts are well-documented.

All this side-story to say: I won't be investing much time in this bug report. I'm sorry for wasting your time.

OrangeTux commented 5 years ago

Ok, no problem.

mosquito commented 5 years ago

@adamhooper I'm really sorry for changing connect_robust behaviour. This was extreamly untestable implementation when using pika. Could you please describe expected behaviour for connect_robust? I tried to do it as native as possible but no one never hint me how to do it correct. A few words in defense of aiormq it's just an implementation AMQP 0.9. I hope you try it and your feedback will be so valueble for the project.

@dequis @OrangeTux today was released aio-pika==5.1.0. I finished aiormq integration and add real integration test for connect_robust. Now RobustConnection working through a simple TCP proxy and you could add some cases for reproduce some bad cases from past.

adamhooper commented 5 years ago

@mosquito No worries -- to be clear, I think aio_pika is fantastic and it served us well for a long time.

It's not clear to me what a "correct" connect_robust() should do -- each use case is different. In my case, the trickiest problem came up during unit-testing: there's no way to say connection = await connect_robust(...) and then connection.close() before the connect succeeds. So it's not possible to unit-test what happens when RabbitMQ isn't present, because the test can't clean up after itself.

Somewhat related: this being Python, there needs to be clearly-defined behavior regarding cancellation: task = loop.create_task(connect_robust(...)); task.cancel(). When I dived into aio_pika I saw suppress(Exception) and winced. asyncio.CancelledError is an exception, and it must be re-raised. This seems like an error in aio_pika/robust_connection.py.

Finally, the "re-bind queues" stuff is tricky. We previously used a single RabbitMQ instance, and connect_robust() would try reconnecting after RabbitMQ went offline. But non-durable queues don't come back online in that case, and RobustConnection has no way of knowing that; so connect_robust() would try to resume consuming queues that didn't exist any more. The correct behavior in our case: declare all queues before consuming them.

... When we switched to aioamqp, I found all these tasks (reconnecting on failure, cancelling connection, logging connection failures, recreating state after reconnect) were easy to solve through custom code. I'm not convinced my design decisions (reconnect-forever-and-log-errors, recreate-exclusive-queues) are universal. I concluded that connect_robust() is a vague idea; I don't think there's a one-size-fits-all solution.

All that to say: I don't know what connect_robust() should do. I don't know what the best practices are. I can't think of any design that would have saved me time. But certainly, documenting all the edge cases (e.g., "what happens if a message callback is cancelled mid-send") would help a lot.

coltenkrauter commented 3 years ago

Has this issue been resolved?

I am still seeing it happen,

lib/python3.7/site-packages/socketio/server.py:295: RuntimeWarning: coroutine 'AsyncPubSubManager.emit' was never awaited
  skip_sid=skip_sid, callback=callback, **kwargs)
RuntimeWarning: Enable tracemalloc to get the object allocation traceback
mosquito commented 2 years ago

@adamhooper a lot has already changed since then, I think in version 7.0.1 it should work correctly.