rigtorp / MPMCQueue

A bounded multi-producer multi-consumer concurrent queue written in C++11
MIT License
1.15k stars 159 forks source link

How is this queue lock-free? #9

Closed WonderfulVoid closed 4 years ago

WonderfulVoid commented 5 years ago

A thread that blocks after the call to fetch_add() will block later producers and consumers that use the same slot. As objects are enqueued (and dequeued), the ring will wrap around and come back to earlier slots. For enqueue, such an event can be pushed far into the future as the queue capacity can be made arbitrarily large (as memory allows) but not for dequeue. Possibly blocking could be avoided if producers and consumers synchronise with each other (e.g. use CAS to update the slot and restart the operation (allocate new index) if the slot doesn't have the expected value.

rigtorp commented 5 years ago

It is lockfree (some thread can always make progress) but it's not wait free (might need to wait for other threads.

On Sun, Jan 27, 2019, 23:59 WonderfulVoid <notifications@github.com wrote:

A thread that blocks after the call to fetch_add() will block later producers and consumers that use the same slot. As objects are enqueued (and dequeued), the ring will wrap around and come back to earlier slots. For enqueue, such an event can be pushed far into the future as the queue capacity can be made arbitrarily large (as memory allows) but not for dequeue. Possibly blocking could be avoided if producers and consumers synchronise with each other (e.g. use CAS to update the slot and restart the operation (allocate new index) if the slot doesn't have the expected value.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/rigtorp/MPMCQueue/issues/9, or mute the thread https://github.com/notifications/unsubscribe-auth/AAK3awblLO49YtFRoHtBDRLgvK7QmDxkks5vHhMZgaJpZM4aU72P .

WonderfulVoid commented 5 years ago

Other threads can make progress until they allocate an index that alias with the blocked slot. So one by one, other threads will eventually block waiting for that first blocking thread. Eventually the whole system will deadlock, all but one thread spinning in enqueue or dequeue waiting for the slot ticket to change to the value expected by each thread. So it is not even obstruction-free.

It's a nifty algorithm that is relatively robust to delays in individual threads and the "risk" of blocking can be controlled by modifying the ring size.

I am working on a lock-free ring buffer: https://github.com/ARM-software/progress64/blob/master/src/p64_lfring.c The design seems to work well with a well-behaved stress test with 8 threads (ARM Cortex-A53). It also performs better (with increasing number of threads) than the "standard" blocking MPMC ring buffer. However I want to do some more verification (e.g. stress test with randomised misbehaviour in individual threads). I also have a "non-blocking" ring buffer design that will be published soon. It will only require a 64-bit (single-word) CAS unlike the lock-free ring buffer above.

rigtorp commented 5 years ago

If you use blocking pop then consumers will indeed block until the producer has produced.

With this queue acquiring a producer slot (fetch_add) is the FIFO barrier. A blocking consumer will block until the next producer has produced.

If you want to reduce the FIFO guarantees you could do that by using a SPSC queue per thread.

But in the end it is lockfree since the suspended producer can make progress independently of any other thread. It just needs to be scheduled. There is no deadlock possible.

So again it is lockfree but not wait free.

On Tue, Jan 29, 2019, 23:51 WonderfulVoid <notifications@github.com wrote:

Other threads can make progress until they allocate an index that alias with the blocked slot. So one by one, other threads will eventually block waiting for that first blocking thread. Eventually the whole system will deadlock, all but one thread spinning in enqueue or dequeue waiting for the slot ticket to change to the value expected by each thread. So it is not even obstruction-free.

It's a nifty algorithm that is relatively robust to delays in individual threads and the "risk" of blocking can be controlled by modifying the ring size.

I am working on a lock-free ring buffer: https://github.com/ARM-software/progress64/blob/master/src/p64_lfring.c The design seems to work well with a well-behaved stress test with 8 threads (ARM Cortex-A53). It also performs better (with increasing number of threads) than the "standard" blocking MPMC ring buffer. However I want to do some more verification (e.g. stress test with randomised misbehaviour in individual threads). I also have a "non-blocking" ring buffer design that will be published soon. It will only require a 64-bit (single-word) CAS unlike the lock-free ring buffer above.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/rigtorp/MPMCQueue/issues/9#issuecomment-458701870, or mute the thread https://github.com/notifications/unsubscribe-auth/AAK3a4YrzHAifGfsUgpYEGSw3YpbT3Rwks5vILQ7gaJpZM4aU72P .

WonderfulVoid commented 5 years ago

"But in the end it is lockfree since the suspended producer can make progress independently of any other thread" But other threads cannot indefinitely make progress if there is one blocking (producer or consumer) thread. Do we agree on this?

rigtorp commented 5 years ago

What do you mean by a blocking thread?

WonderfulVoid commented 5 years ago

If a thread doesn't make the required progress (e.g. setting a flag, incrementing a counter) for whatever reason, it will be blocking progress for other threads when you use blocking algorithms. There are many ways a thread can block progress temporarily or permanently; get interrupted or preempted, crash or get killed, hang (loop forever), wait for other threads, block in a system call, execute a signal handler. If this happens in the wrong place, I (informally) call this thread the blocking thread. The way to avoid such problems is to use non-blocking (e.g. obstruction-free, lock-free, wait-free) algorithms. But if the progress of one thread depends on the progress of other threads, the algorithm is not non-blocking, it is a blocking algorithm. If you depend on the scheduler to let a thread run and make the required progress in order for other threads to make progress, you only have a blocking algorithm. Think of how you make progress with a spin lock.

rigtorp commented 5 years ago

Yes, that's why I don't claim it is wait-free.

In any case I use this successfully in production daily. If you can accept less strict FIFO guarantees you can also you one SPSC queue per thread (https://github.com/rigtorp/SPSCQueue).

WonderfulVoid commented 5 years ago

https://stackoverflow.com/questions/45907210/lock-free-progress-guarantees

rigtorp commented 5 years ago

That's a great discussion! So this algorithm has lock-free enqueue and non-blocking dequeue.