Closed jessr92 closed 9 years ago
Looks like the non-blocking and wait code is a bit off for the halo exchange such that some processes will finish for, say, array v and go onto array w. Array w is a slightly different size and the messages get mixed up such that messages for w to go processes waiting on v or something. Seems quite odd since it doesn't happen all of the time...
If the topSend/bottomRecv and topRecv/bottomSend code is commented out, then the code works for PROC_PER_COL > 3 (tried 4 with PROC_PER_ROW=2).
Renamed since I've found how to cause it with any values of PROC_PER_COL and PROC_PER_ROW
Seems to have been fixed... investigating...
Closing for now... row/col 4/1, 1/4, and 2/2 have no send/recv issues. Will test on bigger numbers once togian becomes free.
Reopening since there seems to be other issues. Rank 0 to 5 seem to be doing halos for one array (say p in boundp2) whereas Rank 6 and 7 seem to be doing halos for f which doesn't make sense given the MPI_Barrier() calls...
Related to the logic (one if statement and the do loops) in press.f95 that call boundary subroutines. Disabled boundary routines in press for now.
The errors are
on a Mac, and, on Linux: