ofi-cray / libfabric-cray

Open Fabric Interfaces
http://ofiwg.github.io/libfabric/
Other
16 stars 9 forks source link

problem with fi_inject and friends for certain communication patterns #559

Closed hppritcha closed 8 years ago

hppritcha commented 8 years ago

It turns out that fi_inject and friends can cause problems for the GNI provider for certain communication patterns. The osu_barrier test run with 4 endpoints (MPI processes) illustrates this well. The barrier timing part works fine, the problem is with the MPI_Reduce to get the timing statistics. With an unmodified GNI provider which has an inject size of 64 bytes, the test hangs. With an inject size of 0, the test passes. The problem has to do with the fact that the communication pattern at this job size differs between the barrier and reduce patterns. Note the reduce is a one-way operation for some ranks, they just send to a parent. In the case of the Open MPI OFI MTL, fi_tinject is used. If the VC connection from the sender to the receiver endpoint is not yet set up, the sender ranks end up in MPI_Finalize in an out-of-band barrier. The VC never got moved to completed state.

If the gni provider is modified to advertise an inject size of 0, the test passes.

There may be other osu collectives that have a similar behavior pattern.

hppritcha commented 8 years ago

This may be fixed but needs verification.

hppritcha commented 8 years ago

No, osu_barrier test does still fail for 4 ranks when using Open MPI.