Closed devreal closed 10 months ago
The issue should not manifest with print X
because X
is actually used. If you substitute Y = X
with no use of Y
then the architecture may be lazy about completing the load, such that the value is not read from memory until after the MPI_SEND
and races with X = ...
at Process A. The addition of another MPI_WIN_SYNC
should include a memory fence that causes pending load/store operations to be completed and ensures the desired ordering.
This has been resolved in MPI 4.0
If we unroll the DO loop in Example 12.21 we get the following code sequence:
Rolf pointed out that the second assignment to
X
can potentially overwriteX
before the load on B becomes visible, yielding the value 2 on the first load. Any loads and stores done inMPI_SEND
andMPI_RECV
would be independent of the load of X.On systems with LoadStore ordering this is no problem (the store in
MPI_SEND
cannot be reordered with the load of X). However, if loads and stores are not ordered, we might see the signal inMPI_SEND
be delivered before the load in B is complete. To avoid this, a second set ofMPI_WIN_SYNC
calls must be added to the example:Thoughts?