open-mpi / ompi

Open MPI main development repository
https://www.open-mpi.org
Other
2.07k stars 844 forks source link

BTL/OFI: retry posting receive buffer #12634

Closed hppritcha closed 2 weeks ago

hppritcha commented 2 weeks ago

There are cases under heavy load (at least for HPE CXI provider) that trying to post a receive buffer can return -FI_EAGAIN.

This PR uses the OFI_RETRY_UNTIL_DONE macro to try reposting receive buffer in the event -FI_EAGAIN is returned from the fi_recv call.

Signed-off-by: Howard Pritchard hppritcha@gmail.com (cherry picked from commit c522de11d61540b263f1fe0f1770828b7bd8688c)