pmodels / mpich

Official MPICH Repository
http://www.mpich.org
Other
540 stars 280 forks source link

segfaults when waiting for completion of a generalized request #164

Closed mpichbot closed 7 years ago

mpichbot commented 7 years ago

Originally by "Lisandro Dalcin" dalcinl@gmail.com on 2008-09-23 13:26:52 -0500


Short story: Calling MPI_Wait() on a generalized request segfaults if MPI_Request_complete() has not been called yet before the call to MPI_Wait(). You have attached a small example implemented with pthreads and sleep() exhibiting this issue.

Long story: I've tracked down the problem: in MPICH2, 'standard' generalized request do not have a 'poll_fn()' callback, and this is not being taken into account in MPIR_Grequest_progress_poke() at line 546 in file 'src/mpi/pt2pt/mpir_request.c'.

The one-line patch attached seems to fix the problem for MPI_Wait() (and possibly MPI_Waitsome()), but I believe the other completion/test calls (MPI_Waitall() still segfaulting) need to be carefully reviewed. Unfortunately, my experience with the MPICH2 code base is not good enough as to provide working patches for this issue.

Lisandro Dalcín

Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC) Instituto de Desarrollo Tecnológico para la Industria Química (INTEC) Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET) PTLC - Güemes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594

mpichbot commented 7 years ago

Originally by Lisandro Dalcin on 2008-09-23 13:26:52 -0500


This message has 2 attachment(s)

mpichbot commented 7 years ago

Originally by Lisandro Dalcin on 2008-09-23 13:26:52 -0500


Attachment added: mpir_request.diff (0.6 KiB) Added by email2trac

mpichbot commented 7 years ago

Originally by Lisandro Dalcin on 2008-09-23 13:26:52 -0500


Attachment added: test_greq.c (1.1 KiB) Added by email2trac

mpichbot commented 7 years ago

Originally by goodell on 2008-10-10 13:43:31 -0500


Wait, Waitsome, and Waitall are now fixed in the trunk by [f6178344be03295508b1adefb32a710601f917e0] and regression tests were added in [840b3b6a0cbeada8d45c1e72271fd17efd62800f]+[6da647bc0c9c808df901e70cd88a5f201bb7226f]. This is [3291] in the mpich2-1.0 branch.

Test, Testsome, and Testall are now tested as well, although they appear to have been working fine from the start.

Lisandro, thanks for the bug report and patch. I have taken your sample code and incorporated it into the mpich2 test suite. Bugs like this are always so much easier to work on when you give us sample code to start with.

-Dave