Question: memory order in fetch_add()

anaanimous commented 1 year ago

In the push and pop functions, the memory order of fetch_add is set to memory_order_acquire if total order is not required. In that case, would it be wrong to set the memory order to relaxed? My reasoning is that fetch_add is not synchronized with any store, and also there is a data dependency between this evaluation and the do_push()/do_pop() that follows in the next line, so I don't see a risk of reading the wrong 'head' due to reordering of some read operation after fetch_add. I may be wrong and lacking some fundamental understanding of memory order, but I can’t seem to come up with any scenario where using relaxed memory order leads to error.

max0x7ba commented 1 year ago

Push must have release memory order store, so that no preceding non-atomic store gets reordered past push. Use case: fill a structure with non-atomic stores and push a pointer to it to another thread - non-atomic stores into the structure must not be reordered past the push of the pointer to it.

Pop must have acquire memory order load to synchronise with the push release store, and no subsequent non-atomic memory load gets reordered prior the acquire load in pop. The latter is unlikely to happen in the aforementioned use case because subsequent loads depending on the popped value cannot possibly be reordered prior the value load.

In other words, push release / pop acquire memory orders are there to order user's preceding non-atomic stores / subsequent loads relative to push/pop respectively, rather than maintain queue's integrity.

That's exactly the memory order semantics of the standard Posix Threads mutex. In fact, acquire/release memory order semantics originated from Posix Threads mutex acquire/release operations, because total sequential memory order for these was too expensive and unnecessary.

IIRC, C++ verbiage have changed whether its only preceding stores or both loads and stores what don't get reordered past the release store. As well as whether its only loads or both loads and stores what don't get reordered prior to acquire load. Regardless, the queue doesn't assume or require anything beyond the weakest forms that only stores don't get reordered past the release store, and only loads don't get reordered prior to acquire load.

anaanimous commented 1 year ago

Thank you for your quick response. Let me try to rephrase my question into few statements that are easier to refute. Please note that my question is regarding multiple-producer multiple consumer queues with non-atomic elements (AtomicQueue2). I would appreciate if you could comment on each statement separately.

head and tail are used to assign a slot to a producer/consumer, but the actual integrity of the queue is maintained by the states_ variable. In other words, even if a slot is randomly assigned to a producer/consumer it wouldn't lead to a race condition.
Therefore, in lines 277, 296, 312, and 325 we can use memory_order_relaxed instead of memory_orderacquire. I argue that these lines only provide a slot (index) to the provider/consumer, but as stated above, the integrity of the slot is tied to the states variable. I am assuming that the reads of the obtained index (i.e. head and tail) that follow the RMW operation cannot be reordered prior to that operation, regardless of the memory order, due to data dependency.
As a side note, the intermediate states STORING and LOADING are not needed if the number of producers and consumers is smaller than the queue capacity, respectively. This is because STORING is necessary only if more than one producer attempts to write to a free slot at the same time, and given that the number of producers is less than the capacity this cannot happen because each producer is assigned a different index upon calling push/try_push. However, I am not sure if index shuffling can refute this assumption. A similar reasoning applies to LOADING. In other words, it is possible to replace the code block on lines 249-260 with something simpler like the following:
```
while(state.load(A) != EMPTY)) {
  spin_loop_pause();
}
q_element = std::forward<U>(element);
state.store(STORED, R);
```

anaanimous commented 1 year ago

I realized that the third point is wrong. A counter example is when one producer is much slower than the rest in which case two producers may end up writing to the same slot. Therefore, STORING (and similarly LOADING) is necessary.

max0x7ba commented 1 year ago

Thank you for your quick response. Let me try to rephrase my question into few statements that are easier to refute. Please note that my question is regarding multiple-producer multiple consumer queues with non-atomic elements (AtomicQueue2). I would appreciate if you could comment on each statement separately.

head and tail are used to assign a slot to a producer/consumer, but the actual integrity of the queue is maintained by the states_ variable. In other words, even if a slot is randomly assigned to a producer/consumer it wouldn't lead to a race condition.

Therefore, in lines 277, 296, 312, and 325 we can use memory_order_relaxed instead of memory_orderacquire. I argue that these lines only provide a slot (index) to the provider/consumer, but as stated above, the integrity of the slot is tied to the states variable. I am assuming that the reads of the obtained index (i.e. head and tail) that follow the RMW operation cannot be reordered prior to that operation, regardless of the memory order, due to data dependency.

You are quite right, thank you for your code review.

Commit 6b65e8f6bdd7b627bf68e6e525a31a4c256a951e implements your findings.

kaynarov commented 1 year ago

@max0x7ba, These release-stores in atomic-spsc branches look suspicious because there are no acquire-loads paired with them.

kaynarov commented 1 year ago

@max0x7ba, and here, I think.

max0x7ba commented 1 year ago

@max0x7ba, These release-stores in atomic-spsc branches look suspicious because there are no acquire-loads paired with them.

Astutely spotted.

Commit 2bc627d7a452efaefbb49547e06b6b52881b4304 fixes this bug.

I would like you guys to know that I appreciate nothing more than you taking the time to look into the code carefully and mindfully as you do, and to report your findings.

max0x7ba commented 1 year ago

@max0x7ba, and here, I think.

pop must have acquire memory order, lack of which you kindly reported earlier.

push must have release memory order, as it does, no issue here, if I am not mistaken.

kaynarov commented 1 year ago

@max0x7ba, honestly, it seems to me that we don't need any order guarantee in this simple case (atomic+spsc), NIL-protection is enough. Can you give me a counterexample?

max0x7ba commented 1 year ago

@max0x7ba, honestly, it seems to me that we don't need any order guarantee in this simple case (atomic+spsc), NIL-protection is enough. Can you give me a counterexample?

I dare not question your honesty.

Doesn't my first comment in this thread explain exactly why? If not, please do elaborate.

kaynarov commented 1 year ago

Oh, I see. I think it could be argued that the order of operations when passing a pointer to a queue should be guaranteed from the outside. But, of course, it's not permissible to weaken guarantees in a library anyway, so you're right. And I see that you have already changed the R to X when storing NIL, so there are no more questions. Thank you!

max0x7ba / atomic_queue

Question: memory order in fetch_add() #51