ulfm-devel / ompi

Open MPI main development repository
https://www.open-mpi.org
Other
0 stars 0 forks source link

coll_comm request cancellation takes a recursive mutex #24

Closed abouteiller closed 6 years ago

abouteiller commented 6 years ago

Original report by Aurelien Bouteiller (Bitbucket: abouteiller, GitHub: abouteiller).


That's an upstream defect that affects only us:

coll_comm requests are placeholders for non-blocking collectives performed during next-cid and friends.

Cancelling that request cancels in turn each of the components of the non-blocking collective request (i.e. a form of generalized request). The top level cancel takes the comm_request_lock, so do all the component requests in turn, which deadlock/abort.

abouteiller commented 6 years ago

Original comment by Aurelien Bouteiller (Bitbucket: abouteiller, GitHub: abouteiller).


demoted to minor. As we now only revoke/cancel automatically PML requests, this bug is of lower importance.

abouteiller commented 6 years ago

Original comment by Aurelien Bouteiller (Bitbucket: abouteiller, GitHub: abouteiller).


It is invalid to cancel a coll/comm request (by MPI spec), and we stopped doing it, so we are fine.