mosquito / aiormq

Pure python AMQP 0.9.1 asynchronous client library
Other
268 stars 58 forks source link

Suboptimal consume performance #37

Closed tilsche closed 5 years ago

tilsche commented 5 years ago

Recently, we ran into severe performance limitations of a consumer implemented with aio-pika. I make this issue here, because I believe the performance is mostly limited by aiormq since aio-pika 4. Publish performance in aio-pika has been discussed in https://github.com/mosquito/aio-pika/issues/107. Because this is mainly focused on aiormq, I am making the issue in this repository.

So I made some simple benchmarks. While aio-pika 2.8.3 was on par with pika. The consumption rate is less than half with the current master, and still only about half of pika with pure aiormq.

Looking at the profiles, it seems a main weakness for aiormq are the large number of tasks that are created while consuming data. Particularly the @task decorated Connection.__receive_frame creates three tasks for reading the frames of each message. My rough understanding is that this is supposed to allow bulk cancellation of all things asynchronous within aiormq.

For testing I made some hacky changes - moving the @task from __receive_frame to the outer __reader_ and __rpc. I'm don't claim to have a good understanding of the impact that such a change would have outside of error-free execution.

Another minor changes was to consolidate two readexactly calls. Again I'm not sure if that would decrease resilience.

In the benchmarks, the @task change brought a large performance increase for both pure aiormq and aio-pika. There is still a gap between aio-pika and aiormq and also between pure aiormq and pika.

aiormq-performance

mosquito commented 5 years ago

I am back from my vacation, and now I returning to the accumulated requests and issues. Thanks for your contribution. I will return with answer as soon as possible.

Could you please run your benchmark with uvloop, in theory native uvloop.Task should be faster.

tilsche commented 5 years ago

Yes, uvloop is faster, but not by the amount that suggested @task-change brings.

Combining the @task-change with uvloop, aiormq reaches a performance comparable to pika / aio-pika-2.8. aio-pika is still significantly slower

aiormq-performance

mosquito commented 5 years ago

I guess that's because the pamqp creates python objects for each frame. Your suggestions works good, I got a flame graph: image

Looks like the last cpu intensive part is the pamqp.frame.unmarshal() call.

I should try rewrite pamqp using cython but previously I should investigate this.

Anyway thank you very much for your investigation. Please feel free to report similar issues.

visobet commented 8 months ago

Hi @mosquito

Did you have the change to do some investigation about the pamqp.frame.unmarshal() in cython idea?

Do you still think it is a good idea?

mosquito commented 8 months ago

@visobet yep, if you might to generate a cython code for AMQP with compatible interface I guess it’s should be amazing