open-mpi / ompi

Open MPI main development repository
https://www.open-mpi.org
Other
2.16k stars 859 forks source link

OpenMPI 4.1.1 + LLVM 12.0 = Re-Link Error #9481

Open BryanFlynt-NOAA opened 3 years ago

BryanFlynt-NOAA commented 3 years ago

On a OpenSuse Linux system I'm trying to install OpenMPI 4.1.1 using LLVM 12.0 (clang and flang). The "configure" and "make" portions complete with no errors but the "make install" attempts to re-link a number of libraries and fails when it can't find "libmpi_usempi.so.40.30.0". I've searched around and found similar "make install" errors when the compiler is not found but this is not the same. Any ideas on how to fix this issue ???

Commands:

> module list
  1) cmake/3.20.2   2) llvm/12.0.0   3) hwloc/2.4.1   4) ucx/1.10.1   5) libevent/2.1.12

> echo $CC
/home/bflynt/opt/modman/apps/llvm/12.0.0/bin/clang
> echo $CXX
/home/bflynt/opt/modman/apps/llvm/12.0.0/bin/clang++
> echo $FC
/home/bflynt/opt/modman/apps/llvm/12.0.0/bin/flang

> ./configure --prefix=${LIB_INSTALL_DIR}                 \
                --enable-mpi-cxx                          \
                --enable-cxx-exceptions                   \
                --enable-mpi-fortran=usempi               \
                --enable-mca-no-build=btl-uct             \
                --with-hwloc=${HWLOC_ROOT}                \
                --with-ucx=${UCX_ROOT}                    \
                --with-libevent=${LIBEVENT_ROOT}          \
                --without-verbs                           \
                -enable-mca-no-build=btl-uct

.... Lots of Output But No Errors ....

> make

.... Lots of Output But No Errors ....

> make install

.... Checks Everything is Built .....
.... Lots of Re-Linking Warnings ....
..... Then this crash .....

Making install in mpi/fortran/use-mpi-tkr
make[2]: Entering directory '/home/bflynt/opt/modman/build/openmpi/4.1.1/llvm/12.0.0/ompi/mpi/fortran/use-mpi-tkr'
make[3]: Entering directory '/home/bflynt/opt/modman/build/openmpi/4.1.1/llvm/12.0.0/ompi/mpi/fortran/use-mpi-tkr'
 /usr/bin/mkdir -p '/home/bflynt/opt/modman/apps/openmpi/4.1.1/llvm/12.0.0/lib64'
 /bin/sh ../../../../libtool   --mode=install /usr/bin/install -c   libmpi_usempi.la '/home/bflynt/opt/modman/apps/openmpi/4.1.1/llvm/12.0.0/lib64'
libtool: warning: relinking 'libmpi_usempi.la'
libtool: install: (cd /home/bflynt/opt/modman/build/openmpi/4.1.1/llvm/12.0.0/ompi/mpi/fortran/use-mpi-tkr; /bin/sh "/home/bflynt/opt/modman/build/openmpi/4.1.1/llvm/12.0.0/libtool"  --silent --tag FC --mode=relink /home/bflynt/opt/modman/apps/llvm/12.0.0/bin/flang -I../../../../ompi/include -I../../../../ompi/include -I. -I../../../.. -I../../../.. -I. -I../../../../ompi/mpi/fortran/use-mpi-tkr -fexceptions -version-info 70:0:30 -fexceptions -L/home/bflynt/opt/modman/apps/hwloc/2.4.1/llvm/12.0.0/lib -L/home/bflynt/opt/modman/apps/libevent/2.1.12/llvm/12.0.0/lib64 -o libmpi_usempi.la -rpath /home/bflynt/opt/modman/apps/openmpi/4.1.1/llvm/12.0.0/lib64 mpi.lo mpi_aint_add_f90.lo mpi_aint_diff_f90.lo mpi_comm_spawn_multiple_f90.lo mpi_testall_f90.lo mpi_testsome_f90.lo mpi_waitall_f90.lo mpi_waitsome_f90.lo mpi_wtick_f90.lo mpi_wtime_f90.lo mpi-tkr-sizeof.lo ../../../../ompi/mpi/fortran/mpif-h/libmpi_mpifh.la /home/bflynt/opt/modman/build/openmpi/4.1.1/llvm/12.0.0/opal/libopen-pal.la -lrt -lm -lutil -lz -lhwloc -levent_core -levent_pthreads )
mv: cannot stat 'libmpi_usempi.so.40.30.0': No such file or directory
libtool:   error: error: relink 'libmpi_usempi.la' with the above command before installing it
make[3]: *** [Makefile:1932: install-libLTLIBRARIES] Error 1
make[3]: Leaving directory '/home/bflynt/opt/modman/build/openmpi/4.1.1/llvm/12.0.0/ompi/mpi/fortran/use-mpi-tkr'
make[2]: *** [Makefile:2047: install-am] Error 2
make[2]: Leaving directory '/home/bflynt/opt/modman/build/openmpi/4.1.1/llvm/12.0.0/ompi/mpi/fortran/use-mpi-tkr'
make[1]: *** [Makefile:3555: install-recursive] Error 1
make[1]: Leaving directory '/home/bflynt/opt/modman/build/openmpi/4.1.1/llvm/12.0.0/ompi'
make: *** [Makefile:1901: install-recursive] Error 1
jsquyres commented 3 years ago

Can you submit all the information listed here: https://www.open-mpi.org/community/help/

BryanFlynt-NOAA commented 3 years ago

I captured the output as described within the link you provided and attached it to this email. If the file size is too large I'll try another method.

On Thu, Oct 7, 2021 at 4:51 AM Jeff Squyres @.***> wrote:

Can you submit all the information listed here: https://www.open-mpi.org/community/help/

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/open-mpi/ompi/issues/9481#issuecomment-937677577, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGB4A4EKZIBRED7WDMY5R23UFV3T7ANCNFSM5FQA6HVA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

-- Bryan T. Flynt Global Systems Division <%28303%29%20497-4313> NOAA Earth System Research Laboratory 325 Broadway R/GSD6 <%28303%29%20497-4313> Boulder, CO 80303 (303) 497-3875 <%28303%29%20497-4313>

BryanFlynt-NOAA commented 3 years ago

Some reported trouble with the bzip2 compression of zip so here it is using the native zip compression. ompi-output.zip

ggouaillardet commented 3 years ago

FWIW, the link error is a consequence of previous f18 failures. here is the first one

  FC       mpi_comm_spawn_multiple_f90.lo
f18-9efe.f90:96:56:

   47 |  CALL mpi_comm_spawn_multiple(count, array_of_commands, array_of_argv, array_o&
      |                                                        2
......
   96 |  CALL mpi_comm_spawn_multiple(count, array_of_commands, array_of_argv, array_o&
      |                                                        1
Error: Type mismatch between actual argument at (1) and actual argument at (2) (REAL(8)/CHARACTER(*)).
execvp(gfortran) failed:

At first glance, that looks like a f18 issue (since f18 seems to rely on gfortran for the object generation) but I will have a look at it.

ggouaillardet commented 3 years ago

OK, maybe the error is legit after all ...

you need to pass -fallow-argument-mismatch to your FCFLAGS, for example

configure FCFLAGS=-fallow-argument-mismatch ...

FWIW, that won't be necessary from Open MPI 5 (see open-mpi/ompi@9865f3aec72d8809643f2fc618daea8ff595e7d4) but that won't be back ported into the v4 series.

ggouaillardet commented 3 years ago

One more thing ... gfortran (from 4.9 iirc, read not RHEL7 default gfortran) have the required support to use the (better) use-mpi-ignore-tkr method (instead of the legacy/fallback use-mpi-tkr).

That requires some directives to be passed to the compiler. Open MPI tries several at configure time, and for gfortran, the directive is !GCC$ ATTRIBUTES NO_ARG_CHECK flang does not understand this directive and simply removes it, hence causing gfortran to fail and hence discarding use-mpi-ignore-tkr. flang does support the !DIR$ IGNORE_TKR directive (that is passed to gfortran, and is tested by Open MPI), but since gfortran does not understand it (and hence ignore it), we are back to square one.

I reported this to flang at https://bugs.llvm.org/show_bug.cgi?id=52152

jsquyres commented 3 years ago

@BryanFlynt-NOAA Was FCFLAGS=-fallow-argument-mismatch sufficient to workaround the issue?