open-mpi / ompi

Open MPI main development repository
https://www.open-mpi.org
Other
2.14k stars 858 forks source link

fortran .mod files installed in libdir instead of includedir #12600

Closed minrk closed 3 months ago

minrk commented 4 months ago

Background information

What version of Open MPI are you using? (e.g., v4.1.6, v5.0.1, git branch name and hash, etc.)

v5.0.3 and 4.1.6

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

conda install openmpi via conda-forge, which in turn was built from a distribution tarball with no patches.

Please describe the system on which you are running


Details of the problem

Fortran mpi.mod and friends are installed in $PREFIX/lib/mpi.mod, etc. But since these files are searched via -I, it seems like they should be installed in $PREFIX/include/mpi.mod instead. FWIW, mpich puts them in $PREFIX/include, and fortran docs suggest include as well.

(note: cross compilation discussion below is not relevant to the issue, just how we discovered it)

Came up here when cross-compiling from intel mac to arm, which means there are two installations of openmpi:

use mpi fails to find mpi.mod because libdir is not on the default include path (I'm not sure why it doesn't come up in more cases), cmake's fortran discovery fails with:

        $BUILD_PREFIX/bin/mpifort  -I$PREFIX/include -march=armv8.3-a -ftree-vectorize -fPIC -fno-stack-protector -O2 -pipe -isystem $PREFIX/include -fdebug-prefix-map=$SRC_DIR=/usr/local/src/conda/scalapack-2.2.0 -fdebug-prefix-map=$PREFIX=/usr/local/src/conda-prefix -fallow-argument-mismatch  -isysroot /Applications/Xcode_14.2.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX11.0.sdk -mmacosx-version-min=11.0 -c $SRC_DIR/build/CMakeFiles/CMakeScratch/TryCompile-Ar0KT6/test_mpi.f90 -o CMakeFiles/cmTC_2821f.dir/test_mpi.f90.o
        $SRC_DIR/build/CMakeFiles/CMakeScratch/TryCompile-Ar0KT6/test_mpi.f90:2:11:

            2 |       use mpi_f08
              |           1
        Fatal Error: Cannot open module file ‘mpi_f08.mod’ for reading at (1): No such file or directory

Adding -I$PREFIX/lib to OMPI_FCFLAGS results in successful compilation.

ggouaillardet commented 4 months ago

I do not necessarily agree with your interpretation. If I understand correctly, you expect $BUILD_PREFIX/bin/mpifort to be a cross compiler out of the box and that is simply not the case. You can double check that with $BUILD_PREFIX/bin/mpifort --showme ..., you will likely see -L$BUILD_PREFIX/lib which is correct but not what you expect (you expect -L$PREFIX/lib) Bottom line, when cross compiling, use gfortran or your Fortran compiler directly and manually set include and lib paths.

minrk commented 4 months ago

Sorry, I should have clarified that LDFLAGS, FCFLAGS, etc. are all set to find the directories in the proper prefix, so -I$PREFIX/include and -L$PREFIX/lib are specified, not $BUILD_PREFIX. I'm not expecting a cross compiler out of the box, but I would expect include files to be in the include directory. -L$PREFIX/lib does not find .mod files, -I$PREFIX/lib does, which is the unusual bit and requires special handling that other mpi implementations and fortran modules don't. All the other fortran packages I can find install their .mod files in include, not lib, plus fortran language docs suggest this location as well.

minrk commented 4 months ago

A more succinct illustration, excluding the compiler wrappers and cross-compilation, just gfortran:

program test_use_mpi

use mpi
implicit none

integer :: ierr, numprocs, proc_num

call mpi_init(ierr)
call mpi_comm_size(MPI_COMM_WORLD, numprocs, ierr)
call mpi_comm_rank(MPI_COMM_WORLD, proc_num, ierr)

print *, 'Hello from Process number', proc_num, &
         ' of ', numprocs, ' processes'

call mpi_finalize(ierr)

end program test_use_mpi

compilations fails with standard args, which I think is reasonably expected to work:

> gfortran -I$PREFIX/include -L$PREFIX/lib -lmpi_mpifh test.f90
    3 | use mpi
      |     1
Fatal Error: Cannot open module file ‘mpi.mod’ for reading at (1): No such file or directory

but succeeds when mixing include arguments and lib directories:

gfortran -I$PREFIX/include -I$PREFIX/lib -L$PREFIX/lib -lmpi_mpifh test.f90
ggouaillardet commented 4 months ago

There are two main MPI implementations: MPICH and Open MPI, the other libraries are generally derivatives.

If you want to cross-compile Scalapack with the MPI wrappers, make sure they are designed for cross-compilation. Otherwise, fix the detection to correctly handle both MPICH and Open MPI.

Even if the .mod files are moved into $PREFIX/include, that won't happen before Open MPI 6, so better fix/enhance Scalapack or use the right wrappers.

minrk commented 4 months ago

Yes, it is easy enough to work around by adding -I$PREFIX/lib to FCFLAGS, which we've done. Cross compilation isn't really relevant to the issue as I illustrated above, though cross compilation is working fine, and the issue isn't related to scalapack, that's just where it was encountered recently (CMake's internal FindMPI is what fails). I just wanted to report the issue of surprising installation paths and compiler arguments, which appear to be unique to openmpi among fortran libraries. Certainly feel free to stick with it and close if the deviation from standards is deliberate or the cost of change is too high. Since the workaround is simple, there is little urgency.

In conda-forge, I think we'll fix installation of the modules to the include directory, as is done in debian's libopenmpi-dev, for example.

ggouaillardet commented 4 months ago

I am a bit surprised cmake failed here and it could be worth investigating. Is it because it is asked to use a busted cross compiler and it assumes it works without testing it thoroughly?

If you can point me to some logs, I will have a look at it.

minrk commented 4 months ago

I shared the output from CMakeConfigureOutput.yaml in the initial report, but here is the full output.

Is it because it is asked to use a busted cross compiler and it assumes it works without testing it thoroughly?

Not at all. It's detecting that the compiler cannot find mpi.mod, doesn't figure out where mpi.mod is because it's not in a standard include location, and stops.

FindMPI searches include paths for mpi.mod (like gfortran itself), so setting -DMPI_FORTRAN_ADDITIONAL_INCLUDE_DIRS=$PREFIX/lib would likely also work.

As you've mentioned, we don't expect cross-compilation to work without help, so we fully expect to have to specify include and lib directories, which we do. FCFLAGS="-I$PREFIX/lib ${FCFLAGS}" is what we need, where $FCFLAGS is what works for ~every other package. I only wanted to raise that openmpi installs include files to the library directory, unlike other fortran packages, and unlike the expectations of CMake, gfortran, etc.

dalcinl commented 4 months ago

@ggouaillardet Our root complaint is the following: Open MPI is installing Fortran .mod files in the libdir location rather than the recommended and almost universally agreed includedir.

Perhaps Open MPI could learn a new configure option to install Fortran modules in includedir intead?

ggouaillardet commented 4 months ago

Thanks for the pointer. I do not fully understand everything (especially how cmake is instructed to use which compiler) but from what I can see, it uses the MPI wrappers $BUILD_PREFIX/bin/mpicc and $BUILD_PREFIX/bin/mpifort. If I understand correctly, cmake will then mpifort --showme:compile to get the required flags, and I suspect your wrapper returns -I$BUILD_PREFIX/lib instead of -I$PREFIX/lib (since it is not supposed to be used out of the box). My interpretation is that cmake does not know how to handle busted Open MPI wrappers, and then fails.

I do appreciate having the Open MPI Fortran modules in the include directory would kind of fix this issue (if mpifort --showme:compile is indeed executed, some directories from $BUILD_PREFIX will get pulled and as long as $PREFIX takes precedence, or the files in $BUILD_PREFIX are "enough compatible", it could be fine), but at this stage, I do believe the right fix is to fix the MPI wrappers when cross-compiling.

minrk commented 4 months ago

My interpretation is that cmake does not know how to handle busted Open MPI wrappers, and then fails.

I'd say that cmake correctly identifies the compiler as not working and halts without bending over backwards to fix it, but sure. This seems like entirely sensible behavior and I have no complaints.

at this stage, I do believe the right fix is to fix the MPI wrappers when cross-compiling.

it's unclear if you mean for you or for us, but if you mean for us, I agree and this was resolved some time ago (adding -I$PREFIX/lib). I don't think there's any action to take related to cross-compiling, though we have also fixed openmpi's module install path in our packaging.

When cross compiling, I fully expect to need to find compiler and link arguments myself, and am fully okay with that requirement as a packager. It was in this process of finding what I needed -I and -L args to be that I encountered the surprising -I$PREFIX/lib. I really am satisfied with the current cross-compilation requirements and regret bringing it up, since it's not actually related to the issue of putting standard files in a surprising location, it just happens to be how I found it, presumably a small mistake that's bothered nobody for years because the compiler wrappers embed -I$PREFIX/lib.

ggouaillardet commented 4 months ago

Since you are switching wrappers from compiler to cross-compiler, it is up to you to fix the wrappers.

You can do that by setting the right paths in $BUILD_PREFIX/share/openmpi/*-wrapper-data.txt, in the case of Fortran, that would be $BUILD_PREFIX/share/openmpi/mpifort-wrapper-data.txt

minrk commented 4 months ago

Yup, and we've done that. All cross compilation's working fine. The only issue here is the peculiar installation path of include files in the lib directory.

ggouaillardet commented 4 months ago

I am not sure I understand correctly. Are you saying the build still fail (e.g. cannot find mpi.mod) after you updated mpifort-wrapper-data.txt?

minrk commented 4 months ago

No builds fail. As mentioned in the original report, adding -I$PREFIX/lib to OMPI_FCFLAGS (or FFLAGS if not using wrappers) is sufficient to get working cross-compilation. This issue is not about cross compilation or anything being broken.

ggouaillardet commented 4 months ago

Your initial report was

I don't understand why only the cross-compilation case is affected. adding -I$PREFIX/lib to OMPI_FCFLAGSresults in successful compilation.

My point is -I$PREFIX/lib should be pulled by cmake from $BUILD_PREFIX/bin/mpif90 --showme:compile, so you should not have to set OMPI_FCFLAGS at all for it to work. I suspect -I$BUILD_PREFIX/lib gets pulled instead, and this is likely because $BUILD_PREFIX/share/openmpi/mpifort-wrapper-data.txt has not been updated for cross-compilation.

Anyway, it seems we cannot understand each other, so I will let other folks comment on this. If you feel any need to tag me, I will be happy to give this an other try.

minrk commented 4 months ago

Sorry for the miscommunication. Yes, I should not have mentioned my lack of understanding of which cases were affected because it was not relevant to the issue at hand, and is why I updated the description to remove it since it could be read as a question, which it wasn't meant to be, and tried to clarify the relationship between the issue (paths) and how we happened to encounter it (cross compilation). I think I understand why now, but it remains not relevant to the actual issue, which affects all openmpi installs, not just cross compilation.

ggouaillardet commented 4 months ago

FWIW, here is a simple CMakeLists.txt that checks use mpi works

In my environment, gfortran is used as the Fortran compiler, Open MPI mpif90 is in the $PATH so -I/.../lib gets pulled and I do not need to set OMPI_FCFLAGS nor FFLAGS.

cmake_minimum_required(VERSION 3.0)
project(foo LANGUAGES Fortran)

find_package(MPI COMPONENTS Fortran REQUIRED)

if (NOT ${MPI_Fortran_HAVE_F90_MODULE})
    message(FATAL_ERROR "use mpi could not be found")
else()
    message(STATUS "use mpi was found")
endif()
minrk commented 4 months ago

Thanks. I'm trying really hard to be clear that I'm not asking for help building against mpi, nor am I trying to claim that anything doesn't work as intended. I really just wanted to report that the include files are in the wrong folder, not that mpifort can't find them. openmpi correctly tracks where it puts module files and records that in the compiler wrapper data. It just happens to be a weird folder, and it looks like a typo.

dalcinl commented 4 months ago

Well, I don't believe it is a typo, but a deliberate choice of Open MPI. While I agree libdir is not the usual place to store for .mod files, I also have to acknowledge that .mod files may not be truly platform-independent as C .h header files can be. Therefore, I can understand the argument about libdir being an appropriate location for module files.

jsquyres commented 4 months ago

I'll throw in a little history here...

It looks like the code that installs the *.mod files dates all the way back to 2012. I'm quite sure that Past Jeff had a reason for putting the *.mod files in $libdir instead of $includedir, but I unfortunately don't remember for 100% sure what the reason was. I suspect it is one or both of the following:

  1. Fortran compilers -- at the time -- looked in $libdir instead of $installdir for modulefiles.
  2. The Fortran community asked me to put the modulefiles in $libdir instead of $installdir.

I say this because I'm very sure that Past Jeff wouldn't have installed (essentially) header files into $libdir without a reason.

That being said, if things have changed since then, and if the norm is that we should install modulefiles into $includedir (yay!), we can/should probably do that. Let's break it into multiple things:

  1. @ggouaillardet mentioned that we probably shouldn't do that until Open MPI v6.0 (big SWAG: end of this year). Seems reasonable. We could do this on main soon-ish, and it would be teed up for v6.0.x.
  2. What about the v5.0.x series? I'm a bit hesitant to change something like this in the middle of a release series. Yes, if you use the wrapper compilers, no one should notice anything (because our wrapper compilers will just update and everything will "just work"). But I worry about those who are painting outside the dotted lines and are doing things outside our supported path. I'd feel better about breaking them at a major release, not a minor release.
    If your workaround of setting OMPI_FCFLAGS is "good enough", and your goal in reporting this was to get us to fix this for future series, skipping v5.0.x might be ok...?
  3. Wha about the v4.1.x series? Unless there's a strong need, I'd rather not touch the v4.1.x series. We're trying to put the v4.1.x series to rest, after all.

Does that sound reasonable?

@minrk @dalcinl Many thanks for reporting this issue.

minrk commented 4 months ago

Thanks for the context!

It's definitely not a strong need, nothing is really hindered by the unusual install location, so no rush at all. Honestly, this is mostly an aesthetic issue, so if doing it in 6.0 makes you most comfortable I'm 100% behind that.

jsquyres commented 4 months ago

@minrk Ok, great. I've re-labeled this PR as v6.0.x. Let's see if we can get this done soon-ish on main.

dalcinl commented 4 months ago

I say this because I'm very sure that Past Jeff wouldn't have installed (essentially) header files into $libdir without a reason.

In defense of Past Jeff, I want to point that in my opinion .mod files are not really header files, they are compiler-dependent and to the best of my knowledge they are not guaranteed to be even platform independent. Therefore, putting .mod files in $libdir is a sensible thing to do, albeit not the standard one.

I've re-labeled this PR as v6.0.x.

A backward-compatible way to do things would be to add a configure option to v5.x to let distributors decide the destination of *.mod files.

ggouaillardet commented 4 months ago

One more datapoint: there is that feature previously suggested by @jeffhammond to have a single Open MPI "instance" with Fortran bindings (including modules) built with several compilers (e.g. one C compiler, many Fortran compilers). That would imply to have Fortran modules in different directories.

For the time being, I'd rather add yet an other configure option to change the location of the Fortran modules.

jeffhammond commented 4 months ago

*.mod files are not really header files, they are compiler-dependent and to the best of my knowledge they are not guaranteed to be even platform independent.

Even though they are usually binary and compiler-dependent, they are used by Fortran compilers like header files. One needs to look at where compilers look for modules: do they look in the -L$libdir path or the -I$incdir path?

ggouaillardet commented 4 months ago

my point is this is irrelevant: either use the wrappers (they already know where to look at), or if one insists on having CMake use the compilers, let it get the correct include path(s) from the wrappers.

jeffhammond commented 4 months ago

Modules are use-included at compile-time, not link-time, and they belong in the include directory.

Not everyone uses wrappers and not everyone uses cmake.

minrk commented 4 months ago

I think the notion of a 'module' dir is not unusual, though includedir is still probably the most sensible default value for it. I think debian's openmpi-dev follows this pattern (it appears to place a copy of includedir headers in the fortran module directory alongside .mod files, in addition to the C includedir, and patches mpifort to set includedir to this module directory).

ggouaillardet commented 4 months ago

from https://packages.debian.org/buster/amd64/libopenmpi-dev/filelist

/usr/lib/x86_64-linux-gnu/fortran/gfortran-mod-15/openmpi/mpi.mod
[...]
/usr/lib/x86_64-linux-gnu/openmpi/lib/mpi.mod
minrk commented 4 months ago

Thanks, that's what I was referring to. /usr/lib/x86_64-linux-gnu/fortran/gfortran-mod-15/openmpi is the moduledir that includes a copy of mpi.h and mpi.mod and is patched to be includedir for mpifort.

ggouaillardet commented 4 months ago

that directory (you call moduledir, which is not an autotools thing) only contains mod files (at least according to the filelist i shared earlier). Anyway, instead of arguing whether $PREFIX/include or $PREFIX/lib is a better place for module files, let's simply add a configure option for that (and also make @amckinstry work easier).

minrk commented 4 months ago

Sorry for the confusion, I didn't realize you were looking at buster. I was looking at stable.

Sorry for the wasted energy, I regret participating.

jsquyres commented 3 months ago

Sorry for the wasted energy, I regret participating.

?? I think that this is a perfectly fine conversation to have. I don't regret anything here.

One point that I think is worth re-emphasizing: the wrapper compilers and the supported mechanisms to extract compiler / linker flags from Open MPI have always been consistent. Hence, anyone using any of those methods will be able to compile and link successfully. I don't think that's under debate here.

What's under debate is 1) whether we should move the .mod files out of $libdir by default, and 2) whether we should provide a configure CLI option to specify where to put them.

If the answer to 1) is yes, then the answer to 2) also has to be yes.

My personal $0.02 is that the answers to both should be "yes".

Per my above comment, I'm totally fine moving the .mod files out of $libdir. I've heard 2 opinions on where we should move them:

  1. $includedir, which seems reasonable.
  2. a module-specific directory, perhaps something like $libdir/x86_64-linux-gnu/fortran/gfortran-mod-15/openmpi, or, more generally, $libdir/ARCH/fortran/FC-mod-XX/openmpi/.
    • I think there's a standard string that can be substituted into ARCH; I just don't remember the AC variable name offhand.
    • FC can easily be the basename of the compiler.
    • I don't know where 15 came from in @ggouaillardet's example, but there's probably a way to get that value dynamically.
    • I would point out, however, that this (default) module-specific path is under $libdir.

Opinions?

minrk commented 3 months ago

I agree that the answer to both is yes, and I think includedir is the most logical default, since that's where ~all other packages seem to put these files. I don't think there is any other standard location, so it seems best to me to leave selecting that to the installer if it should deviate from the default of includedir for include files.

I would point out, however, that this (default) module-specific path is under $libdir.

While it is in /usr/lib, it is not under $libdir, which is /usr/lib/ARCH/openmpi/lib — a sibling, not a parent.

I don't know where 15 came from

It's an environment variable GFORTRAN_VERSION=gfortran-mod-15 set in the debian build environment with the note:

  * Hard-code GFORTRAN_VERSION as gfortran-mod-15 for the moment to avoid
    pulling arch-dep stuff into openmpi-common (mpi wrappers)

It appears to be quite debian-specific and not a standard thing, so I wouldn't suggest trying to replicate the pattern.

jsquyres commented 3 months ago

After much discussion, the choice was made to keep installing Open MPI's Fortran modules into $libdir by default, but we provided a new --with-mpi-moduledir option to configure to let users override this if they wish.

This is included in v5.0.4rc1.