MPI_PUT|GET may have different internal implementations for latency-preferred or message-rate-preferred program. E.g., if it knows the user program use every MPI_PUT|GET in a blocking manner such as the blocking RMA in SHMEM, then MPI may also want to use a blocking data transfer internally to save some software overhead; if the user program uses MPI_PUT|GET in a nonblocking manner (e.g., issuing multiple shmem_putmem_nbi + single queit), then MPI may want to decouple data transfer and completion check to maximize overlap.
Such a preference can be detected via the use of SHMEM APIs (see above example), but unfortunately is lost when calling MPI RMA. Thus, we have to set a hint at window creation time (i.e., at OSHMPI space creation time).
TODO: also add a similar hint for default windows. May need environment variable.
MPI_PUT|GET may have different internal implementations for latency-preferred or message-rate-preferred program. E.g., if it knows the user program use every MPI_PUT|GET in a blocking manner such as the blocking RMA in SHMEM, then MPI may also want to use a blocking data transfer internally to save some software overhead; if the user program uses MPI_PUT|GET in a nonblocking manner (e.g., issuing multiple shmem_putmem_nbi + single queit), then MPI may want to decouple data transfer and completion check to maximize overlap.
Such a preference can be detected via the use of SHMEM APIs (see above example), but unfortunately is lost when calling MPI RMA. Thus, we have to set a hint at window creation time (i.e., at OSHMPI space creation time).
TODO: also add a similar hint for default windows. May need environment variable.