pmodels / mpich

Official MPICH Repository
http://www.mpich.org
Other
565 stars 279 forks source link

Mpi_allreduce doesn't work correctly under Windows x64 & Intel Compiler #1076

Closed mpichbot closed 8 years ago

mpichbot commented 8 years ago

Originally by tmishima@jcity.maeda.co.jp on 2010-08-19 23:03:42 -0500


When real data to exchange is too small such as 1.4012985E-45, mpi_allreduce doesn't return correct result under Windows x64 & Intel compiler. In case of Linux, PGI compiler & openmpi and Windows XP 32, intel compiler & mpich2-1.2.1p1, it' all right as far as I checked.

1.OS: Windows Vista 64

2.Compiler Intel(R) Visual Fortran Compiler for applications running on Intel(R) 64, Version 10.1 Build 20070913 Package ID: w_fc_p_10.1.011j

3.mpi: mpich2-1.2.1p1-win-x86-64.msi

4.sample program DATA IN /3,7/ C INCLUDE 'mpif.h' C CALL MPI_INIT(IER) CALL MPI_COMM_RANK( MPI_COMM_WORLD, MYID, IER ) CALL MPI_COMM_SIZE( MPI_COMM_WORLD, NUMPROCS, IER ) C C ZERO CLEAR DO I=1,M A(I)=0.0 A_LOC(I)=0.0 END DO C C JUST MAKEUP DATA DO I=1,M IF(MOD(I,IN(MYID+1)).EQ.0) A_LOC(I)=TRANSFER(1,0.0) END DO C C THIS IS THE BODY OF TESTING. CALL MPI_ALLREDUCE(A_LOC,A,M,MPI_REAL,MPI_MAX,

5.Corresponding results C:\Users\K830429\Documents\DuCOM>mpiexec -np 2 g:TEST64.exe 0.0000000E+00 0.0000000E+00 1.4012985E-45 0.0000000E+00 0.0000000E+00 1.4012985E-45 0.0000000E+00 0.0000000E+00 1.4012985E-45 0.0000000E+00 0.0000000E+00 1.4012985E-45 0.0000000E+00 0.0000000E+00 1.4012985E-45 0.0000000E+00 0.0000000E+00 1.4012985E-45 0.0000000E+00 0.0000000E+00 1.4012985E-45 0.0000000E+00 0.0000000E+00 1.4012985E-45 0.0000000E+00 1 237 0.0000000E+00 0.0000000E+00 1.4012985E-45 0.0000000E+00 0.0000000E+00 1.4012985E-45 0.0000000E+00 0.0000000E+00 1.4012985E-45 0.0000000E+00 0.0000000E+00 1.4012985E-45 0.0000000E+00 0.0000000E+00 1.4012985E-45 0.0000000E+00 0.0000000E+00 1.4012985E-45 0.0000000E+00 0.0000000E+00 1.4012985E-45 0.0000000E+00 0.0000000E+00 1.4012985E-45 0.0000000E+00 0 237

  1. How to build

    ------settings-------------------

    ARCH = lib ARCHFLAGS = /nologo RANLIB = echo

CC = cl CFLAGS = /nologo /O2

FORTRAN = ifort FFLAGS =

LOADER = link LOADOPTS =

---------------------------------

-------- change here-------------

FINCS = /include:"C:\Documents and Settings\K830429\My Documents\Visual Studio Projects\Netlib64\mpich2\include" FDEFS =

CINCS = -I. CDEFS =

LIBPATH = /LIBPATH:"C:\Documents and Settings\K830429\My Documents\Visual Studio Projects\Netlib64\mpich2\lib"

LIBS = fmpich2.lib mpi.lib

INTEL =

OBJS = TEST.obj

f77exm = TEST64.exe

-------- end of change here ----------

all: $(f77exm)

$(f77exm): $(OBJS) $(LOADER) $(OBJS) $(INTEL) $(LIBS) $(LIBPATH) $(LOADOPTS) /out:$@

.c.obj: $(CC) $(CFLAGS) $(CINCS) $(CDEFS) -c $<

.f.obj: $(FORTRAN) $(FFLAGS) $(FINCS) $(FDEFS) -c $<

.for.obj: $(FORTRAN) $(FFLAGS) $(FINCS) $(FDEFS) -c $<

.f90.obj: $(FORTRAN) $(FFLAGS) $(FINCS) $(FDEFS) -c $<

clean: del *.obj

mpichbot commented 8 years ago

Originally by tmishima@jcity.maeda.co.jp on 2010-08-19 23:14:46 -0500


Attachment added: test.f (0.8 KiB) sample fortran

mpichbot commented 8 years ago

Originally by tmishima@jcity.maeda.co.jp on 2010-08-19 23:15:15 -0500


Attachment added: makefile (1.1 KiB)

mpichbot commented 8 years ago

Originally by tmishima@jcity.maeda.co.jp on 2010-08-19 23:17:06 -0500


Attachment added: testrun.cmd (0.0 KiB)

mpichbot commented 8 years ago

Originally by jayesh on 2010-08-20 10:40:02 -0500


Hi, Thanks for reporting the bug, we will take a look at it. Meanwhile, did you try the 1.3b1 release (http://www.mcs.anl.gov/research/projects/mpich2/downloads/index.php?s=downloads)? There were several bug fixes that went into the 1.3x series.

Regards, Jayesh

mpichbot commented 8 years ago

Originally by tmishima@jcity.maeda.co/jp on 2010-08-21 01:19:14 -0500


Hi, Thank you for your quick reply. Yes,I tried the 1.31b1 release with Intel compiler 10 & 11. Unfortunately, both didn't work correctly as well. Regards, tmishima

mpichbot commented 8 years ago

Originally by jayesh on 2010-08-24 15:44:53 -0500


Hi, I tried running your test case. What is the expected output ?

-Jayesh

mpichbot commented 8 years ago

Originally by jayesh on 2010-08-24 16:08:53 -0500


Hi, Does the following program work for you (I modified your code slightly - rank0 & rank1 alternately have the transfered real version of 1)?

      PARAMETER( M = 10 )
      DIMENSION A_LOC(M),A(M)
C
      DIMENSION IN(2)
      DATA IN /3,7/
C
      INCLUDE 'mpif.h'
C
      CALL MPI_INIT(IER)
      CALL MPI_COMM_RANK( MPI_COMM_WORLD, MYID, IER )
      CALL MPI_COMM_SIZE( MPI_COMM_WORLD, NUMPROCS, IER )
C
C     ZERO CLEAR
      DO I=1,M
        A(I)=0.0
        A_LOC(I)=0.0
      END DO
C
C     JUST MAKEUP DATA
      DO I=1,M
C        IF(MOD(I,IN(MYID+1)).EQ.0) A_LOC(I)=TRANSFER(1,0.0)
        IF(MYID.EQ.0) THEN
            IF(MOD(I,2).EQ.0) THEN
                A_LOC(I) = TRANSFER(1,0.0)
            ELSE
                A_LOC(I) = 0.0
            END IF
        ELSE
            IF(MOD(I,2).EQ.0) THEN
                A_LOC(I) = 0.0
            ELSE
                A_LOC(I) = TRANSFER(1,0.0)
            END IF
        END IF
      END DO
      IF(MYID.EQ.1) WRITE(*,*) (A_LOC(K),K=1,M)
C      IF(MYID.EQ.0) WRITE(*,*) (A_LOC(K),K=1,10)      
C
C     THIS IS THE BODY OF TESTING.
      CALL MPI_ALLREDUCE(A_LOC,A,M,MPI_REAL,MPI_MAX,
     +                   MPI_COMM_WORLD,IER)
C
      CALL MPI_FINALIZE(IER)
C
C     CHECK TEST RESULT
      NUM=0
      DO I=1,M
        IF(TRANSFER(A(I),0).NE.1) NUM=NUM+1
      END DO
C
C     OUTPUT
      IF(MYID.EQ.1) WRITE(*,*) (A(K),K=1,M)
      IF(NUM.NE.0) THEN
        WRITE(*,*) "ERROR : ",MYID,NUM
      ELSE
        WRITE(*,*) "NO ERRORS ",MYID,NUM
      END IF
C
      END

-Jayesh

mpichbot commented 8 years ago

Originally by tmishima@jcity.maeda.co.jp on 2010-08-24 18:03:20 -0500


Hi,[[BR]]

Thank you sending a program.[[BR]] It did not work correctly. The output was as follows:[[BR]] [[BR]] C:\Users\K830429\Documents\DuCOM>mpiexec -np 2 TEST64.exe[[BR]] 1.4012985E-45 0.0000000E+00 1.4012985E-45 0.0000000E+00 1.4012985E-45[[BR]] 0.0000000E+00 1.4012985E-45 0.0000000E+00 1.4012985E-45 0.0000000E+00[[BR]] ERROR : 0 5[[BR]] 0.0000000E+00 1.4012985E-45 0.0000000E+00 1.4012985E-45 0.0000000E+00[[BR]] 1.4012985E-45 0.0000000E+00 1.4012985E-45 0.0000000E+00 1.4012985E-45[[BR]] ERROR : 1 5[[BR]]

mpichbot commented 8 years ago

Originally by jayesh on 2010-08-25 15:59:21 -0500


Hi, You are getting these errors because you are trying to use a "real" version of integer 1. If you disable compiler optimizations you will not get any errors. I am not a fortran expert but my guess is your problem is related to how TRANSFER() casts integers to reals (not related to MPICH2). However if your real issue is sending real numbers that don't fit within the range of a 4-byte REAL you should use 8-byte REALs instead (see the program below).

      PARAMETER( M = 10 )
      REAL(8) A_LOC(M),A(M)
C
      DIMENSION IN(2)
      DATA IN /3,7/
C
      INCLUDE 'mpif.h'
C
      CALL MPI_INIT(IER)
      CALL MPI_COMM_RANK( MPI_COMM_WORLD, MYID, IER )
      CALL MPI_COMM_SIZE( MPI_COMM_WORLD, NUMPROCS, IER )
C
C     ZERO CLEAR
      DO I=1,M
        A(I)=0.0
        A_LOC(I)=0.0
      END DO
C
C     JUST MAKEUP DATA
      DO I=1,M
C        IF(MOD(I,IN(MYID+1)).EQ.0) A_LOC(I)=TRANSFER(1,0.0)
        IF(MYID.EQ.0) THEN
            IF(MOD(I,2).EQ.0) THEN
C                A_LOC(I) = TRANSFER(1,0.0)
                A_LOC(I) = 1.4012985E-45
            ELSE
                A_LOC(I) = 0.0
            END IF
        ELSE
            IF(MOD(I,2).EQ.0) THEN
                A_LOC(I) = 0.0
            ELSE
C                A_LOC(I) = TRANSFER(1,0.0)
                A_LOC(I) = 1.4012985E-45
            END IF
        END IF
      END DO
      IF(MYID.EQ.1) WRITE(*,*) (A_LOC(K),K=1,M)
C      IF(MYID.EQ.0) WRITE(*,*) (A_LOC(K),K=1,10)      
C
C     THIS IS THE BODY OF TESTING.
      CALL MPI_ALLREDUCE(A_LOC,A,M,MPI_REAL8,MPI_MAX,
     +                   MPI_COMM_WORLD,IER)
C
      CALL MPI_FINALIZE(IER)
C
C     CHECK TEST RESULT
      NUM=0
      DO I=1,M
C        IF(TRANSFER(A(I),0).NE.1) NUM=NUM+1
        IF(A(I).NE.1.4012985E-45) NUM=NUM+1
      END DO
C
C     OUTPUT
      IF(MYID.EQ.1) WRITE(*,*) (A(K),K=1,M)
      IF(NUM.NE.0) THEN
        WRITE(*,*) "ERROR : ",MYID,NUM
      ELSE
        WRITE(*,*) "NO ERRORS ",MYID,NUM
      END IF
C
      END

Let us know if you any further questions.

Regards, Jayesh

mpichbot commented 8 years ago

Originally by tmishima@jcity.maeda.co.jp on 2010-08-25 18:07:32 -0500


Hi, Thank you for your explanation. I undestand why mpi_allreduce does not work in my application. I will change my code to avoid this issue. Best regards, tmishima