pmodels / mpich

Official MPICH Repository
http://www.mpich.org
Other
540 stars 280 forks source link

PATCH: Fixes for MPI_Comm_dup and MPI_Comm_split (intercommunicator case) #30

Closed mpichbot closed 7 years ago

mpichbot commented 7 years ago

Originally by "Lisandro Dalcin" dalcinl@gmail.com on 2008-08-01 14:39:19 -0500


Hi all,

Some intercommunicator collectives make use of 'is_low_group' field in MPID_Comm structure. This field is not being correctly filled when MPI_Comm_dup() and MPI_Comm_split() is called on an intercommunicator, and then MPI_Barrier(), MPI_Allgather(), MPI_Allgatherv() (and probably MPI_Reduce_scatter(), I've not tried) deadlock.

You have attached a tentative patch (against SVN trunk) for fixing this issue.

I've tested them for MPI_Comm_dup() case, but not for the MPI_Comm_split() case (but it seems that the low group flag just needs to be inherited from the parent intercommunicator, but perhaps I'm missing something, so please review this case with care).

BTW, Could you anticipate in what version (1.1.0 or perhaps 1.0.7p1) could this issue get fixed?

Regards,

Lisandro Dalcín

Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC) Instituto de Desarrollo Tecnológico para la Industria Química (INTEC) Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET) PTLC - Güemes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594

mpichbot commented 7 years ago

Originally by Lisandro Dalcin on 2008-08-01 14:39:19 -0500


Attachment added: src_mpi_comm.diff (1.2 KiB) Added by email2trac

mpichbot commented 7 years ago

Originally by Lisandro Dalcin on 2008-08-01 14:39:20 -0500


This message has 1 attachment(s)

mpichbot commented 7 years ago

Originally by goodell on 2008-08-01 14:59:21 -0500


Lisandro, this change looks sensible at first glance. However, I'll need to write up a test or two to prove that it currently is failing, that this patch fixes it, and to prevent regressions in the future.

As for what release this will go into, it's hard to say. We'd like to be wrapped up on 1.1.0a1 today or Monday, so this is probably just barely too late for that. However 1.0.7p1 is a possibility (if we have one). Assuming it's a good change, it will definitely make it into 1.0.8 (end of August-ish).

Thanks for the bug report and patch, we'll take a closer look at it as soon as we can.

-Dave

mpichbot commented 7 years ago

Originally by goodell on 2008-08-04 20:44:46 -0500


Fixed with regression tests in [824ca2660258cd7c717ca9ee5b43719cbf0d1ebf], so this should make it into 1.1.0a1. Thanks again for the patch, Lisandro.

resolving