ulfm-devel / ompi

Open MPI main development repository
https://www.open-mpi.org
Other
0 stars 0 forks source link

Optimize number of atomics in error cases during SYNC_WAIT rearming #53

Open abouteiller opened 4 years ago

abouteiller commented 4 years ago

Original report by Aurelien Bouteiller (Bitbucket: abouteiller, GitHub: abouteiller).


While bugfixing some of the erroneous behavior in WAIT_SYNC we discussed (pr #18) the following optimization that could reduce the number of atomics in the error case.

Idea:

Rearm the sync -as-is- without first detaching all requests; still having spurious wakeups, but having less atomics when it happens.

  1. This change entails that the sync_update(status=err) does not set sync->count to 0
  2. , and that the sync remains in a ‘signaling’ state after it has been triggered in error so that we do not mistakenly erase it in error cases
  3. Do We also need a way to update the count target in a safe way after the sync is attached to active reqs (which we do not have now)?

    1. when a request has completed (in error or otherwise) the target count has been decreased on the sync by the sync_wait_updated (during or outside of the WAIT_SYNC period in the wait operation). If we do not reset the target count on error, the counting remains correct.
    2. If we have a request in error, we need to complete the wait now; no need to rearm the sync in this case.
    3. So it appears we don’t need to do this which saves us from the associated thread races.