pmodels / mpich

Official MPICH Repository
http://www.mpich.org
Other
547 stars 281 forks source link

MPICH fails when running compatibility tests. #6336

Closed HathewayWill closed 1 year ago

HathewayWill commented 1 year ago

I am getting an error when running mpirun:

    will@will-MS-7D91:~/WRFHYDRO/Tests/Compatibility$ mpirun ./a.out |& tee comp_test2.txt
       C function called by Fortran
       Values are xx =  2.00 and ii = 1 
    Abort(2139535) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
    MPIR_Init_thread(176)........: 
    MPID_Init(1538)..............: 
    MPIDI_OFI_mpi_init_hook(1511): 
    open_fabric(2565)............: 
    find_provider(2683)..........: OFI fi_getinfo() failed (ofi_init.c:2683:find_provider:No data available)

I have tested the commands on another computer and it works fine. The commands are this:

    echo "Test 2"
    mpifort -c 02_fortran+c+netcdf+mpi_f.f
    mpicc -c 02_fortran+c+netcdf+mpi_c.c
    mpifort 02_fortran+c+netcdf+mpi_f.o \
    02_fortran+c+netcdf+mpi_c.o \
         -L${NETCDF}/lib -lnetcdff -lnetcdf

    mpirun ./a.out |& tee comp_test2.txt

I recently built a new pc, with an MSI Z790 Tomahawk board, Intel-13900K cpu, and 64GB of DDR5 5600mhz RAM. Could this be this issue were one of those things are not compatibile with mpich? Mpich version is 4.0.3.


    sudo apt -y update
    sudo apt -y upgrade
    sudo apt -y install gcc gfortran g++ libtool automake autoconf make m4 default-jre default-jdk csh ksh git python3 python3-dev python2 python2-dev mlocate curl cmake libcurl4-openssl-dev

    echo " "
    ##############################Directory Listing############################
    export HOME=`cd;pwd`

    mkdir $HOME/WRF
    export WRF_FOLDER=$HOME/WRF
    cd $WRF_FOLDER/
    mkdir Downloads
    mkdir WRFPLUS
    mkdir WRFDA
    mkdir Libs
    export DIR=$WRF_FOLDER/Libs
    mkdir Libs/grib2
    mkdir Libs/NETCDF
    mkdir Libs/MPICH
    mkdir -p Tests/Environment
    mkdir -p Tests/Compatibility

    echo " "
    #############################Core Management####################################

    export CPU_CORE=$(nproc)                                             # number of available threads on system
    export CPU_6CORE="6"
    export CPU_HALF=$(($CPU_CORE / 2))                                   #half of availble cores on system
    export CPU_HALF_EVEN=$(( $CPU_HALF - ($CPU_HALF % 2) ))              #Forces CPU cores to even number to avoid partial core export. ie 7 cores would be 3.5 cores.

    if [ $CPU_CORE -le $CPU_6CORE ]                                  #If statement for low core systems.  Forces computers to only use 1 core if there are 4 cores or less on the system.
    then
      export CPU_HALF_EVEN="2"
    else
      export CPU_HALF_EVEN=$(( $CPU_HALF - ($CPU_HALF % 2) ))
    fi

    echo "##########################################"
    echo "Number of Threads being used $CPU_HALF_EVEN"
    echo "##########################################"

    echo " "
    ##############################Downloading Libraries############################
    #Force use of ipv4 with -4
    cd Downloads
    wget -c -4 https://github.com/madler/zlib/archive/refs/tags/v1.2.13.tar.gz
    wget -c -4 https://github.com/HDFGroup/hdf5/archive/refs/tags/hdf5-1_13_2.tar.gz
    wget -c -4 https://github.com/Unidata/netcdf-c/archive/refs/tags/v4.9.0.tar.gz
    wget -c -4 https://github.com/Unidata/netcdf-fortran/archive/refs/tags/v4.6.0.tar.gz
    wget -c -4 https://github.com/pmodels/mpich/releases/download/v4.0.3/mpich-4.0.3.tar.gz
    wget -c -4 https://download.sourceforge.net/libpng/libpng-1.6.39.tar.gz
    wget -c -4 https://www.ece.uvic.ca/~frodo/jasper/software/jasper-1.900.1.zip
    wget -c -4 https://sourceforge.net/projects/opengrads/files/grads2/2.2.1.oga.1/Linux%20%2864%20Bits%29/opengrads-2.2.1.oga.1-bundle-x86_64-pc-linux-gnu-glibc_2.17.tar.gz

    echo " "
    ####################################Compilers#####################################
    export CC=gcc
    export CXX=g++
    export FC=gfortran
    export F77=gfortran
    export CFLAGS="-fPIC -fPIE -O3"

    #IF statement for GNU compiler issue
    export GCC_VERSION=$(/usr/bin/gcc -dumpfullversion | awk '{print$1}')
    export GFORTRAN_VERSION=$(/usr/bin/gfortran -dumpfullversion | awk '{print$1}')
    export GPLUSPLUS_VERSION=$(/usr/bin/g++ -dumpfullversion | awk '{print$1}')

    export GCC_VERSION_MAJOR_VERSION=$(echo $GCC_VERSION | awk -F. '{print $1}')
    export GFORTRAN_VERSION_MAJOR_VERSION=$(echo $GFORTRAN_VERSION | awk -F. '{print $1}')
    export GPLUSPLUS_VERSION_MAJOR_VERSION=$(echo $GPLUSPLUS_VERSION | awk -F. '{print $1}')

    export version_10="10"

    if [ $GCC_VERSION_MAJOR_VERSION -ge $version_10 ] || [ $GFORTRAN_VERSION_MAJOR_VERSION -ge $version_10 ] || [ $GPLUSPLUS_VERSION_MAJOR_VERSION -ge $version_10 ]
    then
      export fallow_argument=-fallow-argument-mismatch
      export boz_argument=-fallow-invalid-boz
    else
      export fallow_argument=
      export boz_argument=
    fi

    export FFLAGS=$fallow_argument
    export FCFLAGS=$fallow_argument

    echo "##########################################"
    echo "FFLAGS = $FFLAGS"
    echo "FCFLAGS = $FCFLAGS"
    echo "##########################################"

    echo " "
    #############################zlib############################
    #Uncalling compilers due to comfigure issue with zlib1.2.13
    #With CC & CXX definied ./configure uses different compiler Flags

    cd $WRF_FOLDER/Downloads
    tar -xvzf v1.2.13.tar.gz
    cd zlib-1.2.13/
    ./configure --prefix=$DIR/grib2
    make -j $CPU_HALF_EVEN
    make -j $CPU_HALF_EVEN install |& tee make.install.log
    #make check

    echo " "
    ##############################MPICH############################
    #F90= due to compiler issues with mpich install
    cd $WRF_FOLDER/Downloads
    tar -xvzf mpich-4.0.3.tar.gz
    cd mpich-4.0.3/
    F90= ./configure --prefix=$DIR/MPICH --with-device=ch3 FFLAGS=$fallow_argument FCFLAGS=$fallow_argument

    make -j $CPU_HALF_EVEN
    make -j $CPU_HALF_EVEN install |& tee make.install.log
    # make check

    export PATH=$DIR/MPICH/bin:$PATH

    export MPIFC=$DIR/MPICH/bin/mpifort
    export MPIF77=$DIR/MPICH/bin/mpifort
    export MPIF90=$DIR/MPICH/bin/mpifort
    export MPICC=$DIR/MPICH/bin/mpicc
    export MPICXX=$DIR/MPICH/bin/mpicxx

    echo " "
    #############################libpng############################
    cd $WRF_FOLDER/Downloads
    export LDFLAGS=-L$DIR/grib2/lib
    export CPPFLAGS=-I$DIR/grib2/include
    tar -xvzf libpng-1.6.39.tar.gz
    cd libpng-1.6.39/
    CC=$MPICC FC=$MPIFC F77=$MPIF77 F90=$MPIF90 CXX=$MPICXX ./configure --prefix=$DIR/grib2
    make -j $CPU_HALF_EVEN
    make -j $CPU_HALF_EVEN install |& tee make.install.log
    #make check
    echo " "
    #############################JasPer############################
    cd $WRF_FOLDER/Downloads
    unzip jasper-1.900.1.zip
    cd jasper-1.900.1/
    ./configure --prefix=$DIR/grib2
    CC=$MPICC FC=$MPIFC F77=$MPIF77 F90=$MPIF90 CXX=$MPICXX ./configure --prefix=$DIR/grib2
    make -j $CPU_HALF_EVEN
    make -j $CPU_HALF_EVEN install |& tee make.install.log
    #make check

    export JASPERLIB=$DIR/grib2/lib
    export JASPERINC=$DIR/grib2/include

    echo " "
    #############################hdf5 library for netcdf4 functionality############################
    cd $WRF_FOLDER/Downloads
    tar -xvzf hdf5-1_13_2.tar.gz
    cd hdf5-hdf5-1_13_2
    CC=$MPICC FC=$MPIFC F77=$MPIF77 F90=$MPIF90 CXX=$MPICXX ./configure --prefix=$DIR/grib2 --with-zlib=$DIR/grib2 --enable-hl --enable-fortran
    make -j $CPU_HALF_EVEN
    make -j $CPU_HALF_EVEN install |& tee make.install.log
    #make check

    export HDF5=$DIR/grib2
    export LD_LIBRARY_PATH=$DIR/grib2/lib:$LD_LIBRARY_PATH

    echo " "
    ##############################Install NETCDF C Library############################
    cd $WRF_FOLDER/Downloads
    tar -xzvf v4.9.0.tar.gz
    cd netcdf-c-4.9.0/
    export CPPFLAGS=-I$DIR/grib2/include
    export LDFLAGS=-L$DIR/grib2/lib
    export LIBS="-lhdf5_hl -lhdf5 -lz -lcurl -lgfortran -lgcc -lm -ldl"
    CC=$MPICC FC=$MPIFC CXX=$MPICXX F90=$MPIF90 F77=$MPIF77 ./configure --prefix=$DIR/NETCDF --disable-dap --enable-netcdf-4 --enable-netcdf4 --enable-shared
    make -j $CPU_HALF_EVEN
    make -j $CPU_HALF_EVEN install |& tee make.install.log
    #make check

    export PATH=$DIR/NETCDF/bin:$PATH
    export NETCDF=$DIR/NETCDF
    echo " "
    ##############################NetCDF fortran library############################
    cd $WRF_FOLDER/Downloads
    tar -xvzf v4.6.0.tar.gz
    cd netcdf-fortran-4.6.0/
    export LD_LIBRARY_PATH=$DIR/NETCDF/lib:$LD_LIBRARY_PATH
    export CPPFLAGS="-I$DIR/NETCDF/include -I$DIR/grib2/include"
    export LDFLAGS="-L$DIR/NETCDF/lib -L$DIR/grib2/lib"
    export LIBS="-lnetcdf -lhdf5_hl -lhdf5 -lz -ldl"
    CC=$MPICC FC=$MPIFC CXX=$MPICXX F90=$MPIF90 F77=$MPIF77 ./configure --prefix=$DIR/NETCDF --enable-netcdf-4 --enable-netcdf4 --enable-shared
    make -j $CPU_HALF_EVEN
    make -j $CPU_HALF_EVEN install |& tee make.install.log
    #make check

    echo " "
    #################################### System Environment Tests ##############

    cd $WRF_FOLDER/Downloads
    wget -c -4 https://www2.mmm.ucar.edu/wrf/OnLineTutorial/compile_tutorial/tar_files/Fortran_C_NETCDF_MPI_tests.tar
    wget -c -4 https://www2.mmm.ucar.edu/wrf/OnLineTutorial/compile_tutorial/tar_files/Fortran_C_tests.tar

    tar -xvf Fortran_C_tests.tar -C $WRF_FOLDER/Tests/Environment
    tar -xvf Fortran_C_NETCDF_MPI_tests.tar -C $WRF_FOLDER/Tests/Compatibility

    export one="1"
    echo " "
    ############## Testing Environment #####

    cd $WRF_FOLDER/Tests/Environment

    echo " "
    echo " "
    echo "Environment Testing "
    echo "Test 1"
    gfortran TEST_1_fortran_only_fixed.f
    ./a.out |& tee env_test1.txt
    export TEST_PASS=$(grep -w -o -c "SUCCESS" env_test1.txt | awk  '{print$1}')
     if [ $TEST_PASS -ge 1 ]
        then
          echo "Enviroment Test 1 Passed"
        else
          echo "Environment Compiler Test 1 Failed"
          exit
      fi
    read -t 3 -p "I am going to wait for 3 seconds only ..."

    echo " "
    echo "Test 2"
    gfortran TEST_2_fortran_only_free.f90
    ./a.out |& tee env_test2.txt
    export TEST_PASS=$(grep -w -o -c "SUCCESS" env_test2.txt | awk  '{print$1}')
     if [ $TEST_PASS -ge 1 ]
        then
          echo "Enviroment Test 2 Passed"
        else
          echo "Environment Compiler Test 2 Failed"
          exit
      fi
    echo " "
    read -t 3 -p "I am going to wait for 3 seconds only ..."

    echo " "
    echo "Test 3"
    gcc TEST_3_c_only.c
    ./a.out |& tee env_test3.txt
    export TEST_PASS=$(grep -w -o -c "SUCCESS" env_test3.txt | awk  '{print$1}')
     if [ $TEST_PASS -ge 1 ]
        then
          echo "Enviroment Test 3 Passed"
        else
          echo "Environment Compiler Test 3 Failed"
          exit
      fi
    echo " "
    read -t 3 -p "I am going to wait for 3 seconds only ..."

    echo " "
    echo "Test 4"
    gcc -c -m64 TEST_4_fortran+c_c.c
    gfortran -c -m64 TEST_4_fortran+c_f.f90
    gfortran -m64 TEST_4_fortran+c_f.o TEST_4_fortran+c_c.o
    ./a.out |& tee env_test4.txt
    export TEST_PASS=$(grep -w -o -c "SUCCESS" env_test4.txt | awk  '{print$1}')
     if [ $TEST_PASS -ge 1 ]
        then
          echo "Enviroment Test 4 Passed"
        else
          echo "Environment Compiler Test 4 Failed"
          exit
      fi
    echo " "
    read -t 3 -p "I am going to wait for 3 seconds only ..."

    echo " "
    ############## Testing Environment #####

    cd $WRF_FOLDER/Tests/Compatibility

    cp ${NETCDF}/include/netcdf.inc .

    echo " "
    echo " "
    echo "Library Compatibility Tests "
    echo "Test 1"
    gfortran -c 01_fortran+c+netcdf_f.f
    gcc -c 01_fortran+c+netcdf_c.c
    gfortran 01_fortran+c+netcdf_f.o 01_fortran+c+netcdf_c.o \
         -L${NETCDF}/lib -lnetcdff -lnetcdf

         ./a.out |& tee comp_test1.txt
         export TEST_PASS=$(grep -w -o -c "SUCCESS" comp_test1.txt | awk  '{print$1}')
          if [ $TEST_PASS -ge 1 ]
             then
               echo "Compatibility Test 1 Passed"
             else
               echo "Compatibility Compiler Test 1 Failed"
               exit
           fi
         echo " "
         read -t 3 -p "I am going to wait for 3 seconds only ..."

    echo " "

    echo "Test 2"
    mpifort -c 02_fortran+c+netcdf+mpi_f.f
    mpicc -c 02_fortran+c+netcdf+mpi_c.c
    mpifort 02_fortran+c+netcdf+mpi_f.o \
    02_fortran+c+netcdf+mpi_c.o \
         -L${NETCDF}/lib -lnetcdff -lnetcdf

    mpirun ./a.out |& tee comp_test2.txt
HathewayWill commented 1 year ago

So this issue was having the intel mpi program in the $PATH and $LD_LIBRARY_PATH

Once I removed the intel mpi program from the both of those places in .bashrc it fixed the issue.