rigtorp / MPMCQueue

A bounded multi-producer multi-consumer concurrent queue written in C++11
MIT License
1.15k stars 159 forks source link

Congestion issues with pop and push #18

Closed gronron closed 4 years ago

gronron commented 4 years ago

Hello,

I was benchmarking your MPMCQueue with https://gist.github.com/TurpentineDistillery/cba204646e631a3eeda5b06cac595fde on my Ryzen R9 3900x (windows 10). And I noticed that the MPMCQueue has struggles with the last benchmark (128 producers and 128 consumer). I test with 24/24 and it is still slow.

So with this benchmak I tried to use: template void push(rigtorp::MPMCQueue& q, T value) { while(!q.try_push(value)) { mt::min_sleep(); } }

template T pop(rigtorp::MPMCQueue& q) { T item{}; while(!q.try_pop(item)) { mt::min_sleep(); } return item; }

And the performance on the 24/24 and 96/96 was nice. I don't understand why the initial pop and push has so bad performance when there are far more thread than my CPU can handle.

Thanks you.

rigtorp commented 4 years ago

Sorry for the late reply.

Your CPU has 24 hardware threads. When you go beyond 24 threads using this queue you will run into issues when doing blocking push or pop since threads that are blocked will not give up the CPU to the scheduler. This is the same issue as with spinlocks. Using futex directly or the new C++20 https://en.cppreference.com/w/cpp/atomic/atomic_wait API the queue can be modified to efficiently wait during blocking operations. I haven't implemented this since it's not the behavior I desire in my use-cases.

On Fri, Jan 24, 2020 at 12:26 PM Geoffrey TOURON notifications@github.com wrote:

Hello,

I was benchmarking your MPMCQueue with https://gist.github.com/TurpentineDistillery/cba204646e631a3eeda5b06cac595fde on my Ryzen R9 3900x (windows 10). And I noticed that the MPMCQueue has struggles with the last benchmark (128 producers and 128 consumer). I test with 24/24 and it is still slow.

So with this benchmak I tried to use: template void push(rigtorp::MPMCQueue& q, T value) { while(!q.try_push(value)) { mt::min_sleep(); } }

template T pop(rigtorp::MPMCQueue& q) { T item{}; while(!q.try_pop(item)) { mt::min_sleep(); } return item; }

And the performance on the 24/24 and 96/96 was nice. I don't understand why the initial pop and push has so bad performance when there is far more thread than my CPU can handle.

Thanks you.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/rigtorp/MPMCQueue/issues/18?email_source=notifications&email_token=AABLO2ZHOTQ7VFQM6YNDUDTQ7NFHRA5CNFSM4KLLU23KYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IITVYMQ, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABLO2ZRTD4WOSFWXIGFIZDQ7NFHRANCNFSM4KLLU23A .