Closed taiki-e closed 2 years ago
At least crossbeam and event-listener also have the same issue, but fixing them is probably more complex...
I feel a bit uncomfortable with this commit.
Admittedly, I don't know what exactly is the role of the fence here. This fence does not exist in Dmitry Vyukov's original implementation of the queue, so I guess it was added as part of the modifications that ensure that this queue is linearisable (unlike the original queue).
That being said, if the cross-platform solution is indeed to place the load before the fence (this, I do not know) then I am pretty sure that the intel specialization that uses a lock
operation instead of an mfence
should also keep the load before.
I did look at https://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html but could not see where it states that lock + mov
(in this order) is equivalent to mov + mfence
. In fact, the latest GCC does use the lock
optimization and definitely preserves the order, i.e. mov + lock
(see this godbolt: https://godbolt.org/z/o3rYdTvYv).
That being said, if the cross-platform solution is indeed to place the load before the fence (this, I do not know) then I am pretty sure that the intel specialization that uses a lock operation instead of an mfence should also keep the load before.
I would usually expect that to be the case -- a relaxed load followed by an acquire-or-stronger fence can induce a synchronization edge. But I don't know the context for this particular code.
Does something break, or perf go down badly, if the fence is moved after the load?
The problem seems to be that the original author of this code confused fence in the x86 hardware memory model with atomic fence in the C++ memory model. (On x86,
lock cmpxchg; mov (load from memory)
is fine. See also https://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html. On C++ memory model and many architectures, fence for load should beload; fence
)Fixes https://github.com/bevyengine/bevy/issues/5164 FYI @cbeuw