Missing calls to MPI_WIN_SYNC in Example 12.21

devreal commented 2 years ago

If we unroll the DO loop in Example 12.21 we get the following code sequence:

  A:                                            B:
  X=... // 1
  MPI_F_SYNC_REG(X)
  MPI_WIN_SYNC(win)
  MPI_SEND                                      MPI_RECV
                                                MPI_WIN_SYNC(win)
                                                MPI_F_SYNC_REG(X)
                                                Y = X
                                                MPI_F_SYNC_REG(X)
                                                MPI_SEND
                                                print Y

  MPI_RECV
  MPI_F_SYNC_REG(X)
  /* X might be overwritten before X is loaded on B*/
  X=... // 2
  MPI_F_SYNC_REG(X)
  MPI_WIN_SYNC(win)
  MPI_SEND                                      MPI_RECV
                                                ....

Rolf pointed out that the second assignment to X can potentially overwrite X before the load on B becomes visible, yielding the value 2 on the first load. Any loads and stores done in MPI_SEND and MPI_RECV would be independent of the load of X.

On systems with LoadStore ordering this is no problem (the store in MPI_SEND cannot be reordered with the load of X). However, if loads and stores are not ordered, we might see the signal in MPI_SEND be delivered before the load in B is complete. To avoid this, a second set of MPI_WIN_SYNC calls must be added to the example:

  A:                                            B:
  X=... // 1
  MPI_F_SYNC_REG(X)
  MPI_WIN_SYNC(win)
  MPI_SEND                                      MPI_RECV
                                                MPI_WIN_SYNC(win)
                                                MPI_F_SYNC_REG(X)
                                                print X // OR: Y = X
                                                MPI_F_SYNC_REG(X)
                                                MPI_WIN_SYNC(win) // make sure the load is complete before sending the signal
                                                MPI_SEND

  MPI_RECV
  MPI_WIN_SYNC(win) // make sure the assignment of X becomes visible after the recv completed
  MPI_F_SYNC_REG(X)
  /* X might be overwritten before X is loaded on B*/
  X=... // 2
  MPI_F_SYNC_REG(X)
  MPI_WIN_SYNC(win)
  MPI_SEND                                      MPI_RECV
                                                ....

Thoughts?

jdinan commented 2 years ago

The issue should not manifest with print X because X is actually used. If you substitute Y = X with no use of Y then the architecture may be lazy about completing the load, such that the value is not read from memory until after the MPI_SEND and races with X = ... at Process A. The addition of another MPI_WIN_SYNC should include a memory fence that causes pending load/store operations to be completed and ensures the desired ordering.

devreal commented 10 months ago

This has been resolved in MPI 4.0

mpiwg-rma / rma-issues

Missing calls to MPI_WIN_SYNC in Example 12.21 #20