Closed mpichbot closed 8 years ago
Originally by michael on 2009-02-15 13:32:34 -0600
This message has 0 attachment(s)
Originally by Rajeev Thakur on 2009-02-15 18:37:37 -0600
I believe you are using Bull MPI not stock MPICH2. At least if you use the
MPICH2 from Argonne, your program should work. You may want to contact the
support folks for Bull MPI to see why it doesn't work with that MPI.
Rajeev
> -----Original Message-----
> From: mpich2-bugs-bounces@mcs.anl.gov
> [mailto:mpich2-bugs-bounces@mcs.anl.gov] On Behalf Of mpich2
> Sent: Sunday, February 15, 2009 1:33 PM
> To: undisclosed-recipients:
> Subject: [mpich2-maint] #418: question about mpich2 implementation
>
> -----------------------------------------------------+--------
> ----------
> -----------------------------------------------------+----
> Reporter: michael <michael.bane@manchester.ac.uk> |
> Type: bug
> Status: new |
> Priority: major
> Milestone: |
> Component: mpich2
> -----------------------------------------------------+--------
> ----------
> -----------------------------------------------------+----
>
>
> {{{
>
> I believe the attached code should work correctly but I find
> that using the mpich2 implementation on one particular box
> it hangs for odd numbers of processors (not every time but
> frequently), whereas this code runs fine on another box I've
> tried (albeit with OpenMPI)
>
> Details are below and I'd welcome suggestions as to the
> cause of the problem. Note that if I add WRITE statements or
> use the debugger the problem appears to go away. Adding a
> FLUSH and BARRIER immediately after the WRITE stmt makes no
> difference.
>
> To confuse myself further, if I replace the MPI_Send() by
> MPI_SSend(), ie synchronous, sometimes the code completes
> whereas other times it appears to hang (see sync.out example at end)
>
> Thanks, Michael
>
> ~/RCS/myCourses/Intro_to_MPI/MPI_Intro_exercises$ cat
> mkb_ring_solution_send_and_recv_portable.f90
> PROGRAM ring
> ! this program will work on all MPI implementations
> USE MPI
> IMPLICIT NONE
>
> ! since we're only sending a single message between any
> src/dest pair we can use a single tag
> INTEGER, PARAMETER :: myTag=101
>
> INTEGER :: ierror, inputRank, myRank, size
> INTEGER :: sendTo, recvFrom
> INTEGER :: recv_status(MPI_STATUS_SIZE)
>
>
> ! initialise MPI
> CALL MPI_INIT(ierror)
>
> ! determine my rank and total size
> CALL MPI_COMM_RANK(MPI_COMM_WORLD, myRank, ierror)
> CALL MPI_COMM_SIZE(MPI_COMM_WORLD, size, ierror)
>
> ! set up which process rank is to my right (ie clockwise) for sending
> sendTo = myRank + 1
> IF (sendTo =# size) sendTo0
> recvFrom = myRank - 1
> IF (recvFrom =# -1) recvFromsize-1
>
> ! send my rank clockwise (from recvFrom to sendTo)
>
> ! to ensure nobody everybody is sending (and possibly
> waiting) at the same time, we split into even (send then
> recv) ! and odd (recv then send)
>
> if (mod(myRank,2)==0) then
> call mpi_send(myRank,1,MPI_INTEGER,sendTo, myTag, &
> MPI_COMM_WORLD,ierror)
> call mpi_recv(inputRank,1,MPI_INTEGER,recvFrom,myTag, &
> MPI_COMM_WORLD,recv_status,ierror)
> else
> call mpi_recv(inputRank,1,MPI_INTEGER,recvFrom,myTag, &
> MPI_COMM_WORLD,recv_status,ierror)
> call mpi_send(myRank,1,MPI_INTEGER,sendTo, myTag, &
> MPI_COMM_WORLD,ierror)
> endif
>
> write(*,*) 'i am #',myRank,' and I received a new
> rank=',inputRank
>
> CALL MPI_FINALIZE(ierror)
>
> END PROGRAM ring
>
> ~/RCS/myCourses/Intro_to_MPI/MPI_Intro_exercises$ mpif90 -O0
> mkb_ring_solution_send_and_recv_portable.f90;mpif90 -show
> ifort: Command line warning: overriding '-O3' with '-O0'
> ifort -O3 -I/opt/mpi/mpibull2-0.9.7-2.t_RC4v4.3/include
> -L/home/horace/mccssmb2/.mpibull2/lib
> -L/opt/mpi/mpibull2-0.9.7-2.t_RC4v4.3/lib -lmpidev -lmpi
> -lrt -ldl -lelan -lelanctrl -lcpuset
>
> ~/RCS/myCourses/Intro_to_MPI/MPI_Intro_exercises$ date;prun
> -O -n 5 -p login ./a.out Sun Feb 15 19:06:56 GMT 2009
> Using qxelan driver, build for MPIBull2 0.9.7-t (Ishtar) 20060726-1607
> i am # 1 and I received a new rank= 0
> i am # 2 and I received a new rank= 1
> i am # 3 and I received a new rank= 2
> i am # 4 and I received a new rank= 3
>
> (CNTL-C)
> ~/RCS/myCourses/Intro_to_MPI/MPI_Intro_exercises$ date Sun
> Feb 15 19:23:47 GMT 2009
>
> ~/RCS/myCourses/Intro_to_MPI/MPI_Intro_exercises$ prun -I -n
> 5 -p login ./sync.out Using qxelan driver, build for
> MPIBull2 0.9.7-t (Ishtar) 20060726-1607
> i am # 1 and I received a new rank= 0
> i am # 2 and I received a new rank= 1
>
> [prun -O means over-commit if required, -I means no
> over-commit but fail without running if insufficient resources]
>
> ```
>
>
> --
> Ticket URL: <https://trac.mcs.anl.gov/projects/mpich2/ticket/418>
>
}}}
Originally by michael bane on 2009-02-15 18:58:52 -0600
Yes, it is a Bull supplied MPI (I thought just their mpich2 build but
can check in the morning).
As much as anything I wanted to double check that the code *should*
work...
ta, M
> -----------------------------------------------------
> +----------------------
> Reporter: michael <michael.bane@manchester.ac.uk> | Owner:
> Type: bug | Status:
> new
> Priority: major | Milestone:
> Component: mpich2 | Resolution:
> Keywords: |
> -----------------------------------------------------
> +----------------------
>
>
> Comment (by Rajeev Thakur):
>
> {{{
>
> I believe you are using Bull MPI not stock MPICH2. At least if you
> use the
> MPICH2 from Argonne, your program should work. You may want to
> contact the
> support folks for Bull MPI to see why it doesn't work with that MPI.
>
> Rajeev
>
>
>> -----Original Message-----
>> From: mpich2-bugs-bounces@mcs.anl.gov
>> [mailto:mpich2-bugs-bounces@mcs.anl.gov] On Behalf Of mpich2
>> Sent: Sunday, February 15, 2009 1:33 PM
>> To: undisclosed-recipients:
>> Subject: [mpich2-maint] #418: question about mpich2 implementation
>>
>> -----------------------------------------------------+--------
>> ----------
>> -----------------------------------------------------+----
>> Reporter: michael <michael.bane@manchester.ac.uk> |
>> Type: bug
>> Status: new |
>> Priority: major
>> Milestone: |
>> Component: mpich2
>> -----------------------------------------------------+--------
>> ----------
>> -----------------------------------------------------+----
>>
>>
>> {{{
>>
>> I believe the attached code should work correctly but I find
>> that using the mpich2 implementation on one particular box
>> it hangs for odd numbers of processors (not every time but
>> frequently), whereas this code runs fine on another box I've
>> tried (albeit with OpenMPI)
>>
>> Details are below and I'd welcome suggestions as to the
>> cause of the problem. Note that if I add WRITE statements or
>> use the debugger the problem appears to go away. Adding a
>> FLUSH and BARRIER immediately after the WRITE stmt makes no
>> difference.
>>
>> To confuse myself further, if I replace the MPI_Send() by
>> MPI_SSend(), ie synchronous, sometimes the code completes
>> whereas other times it appears to hang (see sync.out example at end)
>>
>> Thanks, Michael
>>
>> ~/RCS/myCourses/Intro_to_MPI/MPI_Intro_exercises$ cat
>> mkb_ring_solution_send_and_recv_portable.f90
>> PROGRAM ring
>> ! this program will work on all MPI implementations
>> USE MPI
>> IMPLICIT NONE
>>
>> ! since we're only sending a single message between any
>> src/dest pair we can use a single tag
>> INTEGER, PARAMETER :: myTag=101
>>
>> INTEGER :: ierror, inputRank, myRank, size
>> INTEGER :: sendTo, recvFrom
>> INTEGER :: recv_status(MPI_STATUS_SIZE)
>>
>>
>> ! initialise MPI
>> CALL MPI_INIT(ierror)
>>
>> ! determine my rank and total size
>> CALL MPI_COMM_RANK(MPI_COMM_WORLD, myRank, ierror)
>> CALL MPI_COMM_SIZE(MPI_COMM_WORLD, size, ierror)
>>
>> ! set up which process rank is to my right (ie clockwise) for sending
>> sendTo = myRank + 1
>> IF (sendTo =# size) sendTo0
>> recvFrom = myRank - 1
>> IF (recvFrom =# -1) recvFromsize-1
>>
>> ! send my rank clockwise (from recvFrom to sendTo)
>>
>> ! to ensure nobody everybody is sending (and possibly
>> waiting) at the same time, we split into even (send then
>> recv) ! and odd (recv then send)
>>
>> if (mod(myRank,2)==0) then
>> call mpi_send(myRank,1,MPI_INTEGER,sendTo, myTag, &
>> MPI_COMM_WORLD,ierror)
>> call mpi_recv(inputRank,1,MPI_INTEGER,recvFrom,myTag, &
>> MPI_COMM_WORLD,recv_status,ierror)
>> else
>> call mpi_recv(inputRank,1,MPI_INTEGER,recvFrom,myTag, &
>> MPI_COMM_WORLD,recv_status,ierror)
>> call mpi_send(myRank,1,MPI_INTEGER,sendTo, myTag, &
>> MPI_COMM_WORLD,ierror)
>> endif
>>
>> write(*,*) 'i am #',myRank,' and I received a new
>> rank=',inputRank
>>
>> CALL MPI_FINALIZE(ierror)
>>
>> END PROGRAM ring
>>
>> ~/RCS/myCourses/Intro_to_MPI/MPI_Intro_exercises$ mpif90 -O0
>> mkb_ring_solution_send_and_recv_portable.f90;mpif90 -show
>> ifort: Command line warning: overriding '-O3' with '-O0'
>> ifort -O3 -I/opt/mpi/mpibull2-0.9.7-2.t_RC4v4.3/include
>> -L/home/horace/mccssmb2/.mpibull2/lib
>> -L/opt/mpi/mpibull2-0.9.7-2.t_RC4v4.3/lib -lmpidev -lmpi
>> -lrt -ldl -lelan -lelanctrl -lcpuset
>>
>> ~/RCS/myCourses/Intro_to_MPI/MPI_Intro_exercises$ date;prun
>> -O -n 5 -p login ./a.out Sun Feb 15 19:06:56 GMT 2009
>> Using qxelan driver, build for MPIBull2 0.9.7-t (Ishtar)
>> 20060726-1607
>> i am # 1 and I received a new rank= 0
>> i am # 2 and I received a new rank= 1
>> i am # 3 and I received a new rank= 2
>> i am # 4 and I received a new rank= 3
>>
>> (CNTL-C)
>> ~/RCS/myCourses/Intro_to_MPI/MPI_Intro_exercises$ date Sun
>> Feb 15 19:23:47 GMT 2009
>>
>> ~/RCS/myCourses/Intro_to_MPI/MPI_Intro_exercises$ prun -I -n
>> 5 -p login ./sync.out Using qxelan driver, build for
>> MPIBull2 0.9.7-t (Ishtar) 20060726-1607
>> i am # 1 and I received a new rank= 0
>> i am # 2 and I received a new rank= 1
>>
>> [prun -O means over-commit if required, -I means no
>> over-commit but fail without running if insufficient resources]
>>
>> ```
>>
>>
>> --
>> Ticket URL: <https://trac.mcs.anl.gov/projects/mpich2/ticket/418>
>>
>
> }}}
>
> --
> Ticket URL: <https://trac.mcs.anl.gov/projects/mpich2/ticket/418#comment:
> >
}}}
Originally by michael michael.bane@manchester.ac.uk on 2009-02-15 13:32:34 -0600