rigtorp / MPMCQueue

A bounded multi-producer multi-consumer concurrent queue written in C++11
MIT License
1.13k stars 160 forks source link

Is ARM supported? #43

Open pierricgimmig opened 8 months ago

pierricgimmig commented 8 months ago

Has the queue been tested on ARM? Thanks!

rigtorp commented 5 months ago

No, but it's implemented using the C++11 memory model and should work on all CPU architectures.

On Sat, Nov 11, 2023 at 2:55 PM Pierric Gimmig @.***> wrote:

Has the queue been tested on ARM? Thanks!

— Reply to this email directly, view it on GitHub https://github.com/rigtorp/MPMCQueue/issues/43, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABLO2ZVUK3RYUHCGPAVJOLYD7Q3DAVCNFSM6AAAAAA7HP3JUWVHI2DSMVQWIX3LMV43ASLTON2WKOZRHE4DSMJRGUZTENY . You are receiving this because you are subscribed to this thread.Message ID: @.***>

Philippe91 commented 4 months ago

The queue works fine on Apple Silicon M2 (ARM64). However, when I benchmark it and compare it to an Intel CPU, I get surprising results:

When there is no contention at all, the ARM version runs twice as fast as the Intel CPU. But with high contention, it's the reverse, and even worse: the ARM is almost 10 times slower. I have no explanation for this.

I even tried replacing compare_exchange_strong with compare_exchange_weak (which I think is sufficient, given that there is a loop), but that does not help.

Even using memory_order_acq_rel instead of the default memory_order_seq_cst for the CAS, does not improve performance (by the way, I wonder if using memory_order_seq_cst as the default is normal).

It would be great if other people could do the same kind of test.

rpopescu commented 4 months ago

hi @Philippe91 ! would you like to share your test program? thanks!

Philippe91 commented 4 months ago

would you like to share your test program?

It's not possible due to dependencies on various parts of my framework. However, you would achieve similar results to those on my three computers. What would be interesting, on the other hand, are different benchmarks.