open-mpi / ompi

Open MPI main development repository
https://www.open-mpi.org
Other
2.14k stars 859 forks source link

v5.0.0: Default build on MacOS with OneAPI compilers fails #12052

Open jsquyres opened 11 months ago

jsquyres commented 11 months ago

With the v5.0.0 tarball, building with the Intel OneAPI compilers on MacOS Sonoma (14.x) with XCode 15.x, a default build fails because it can't find the <sys/statfs.h> file.

Note that per #12051, it is necessary to configure the v5.0.0 tarball will the following (specifying to explicitly use the internal packages):

./configure CC=icc FC=ifort CXX=icpc --with-libevent=internal --with-hwloc=internal --with-pmix=internal --with-prrte=internal

After configure succeeds and make is invoked to build, it will ultimately fail with:

Making all in util
  CC       pmix_path.lo
icc: remark #10441: The Intel(R) C++ Compiler Classic (ICC) is deprecated and will be removed from product release in the second half of 2023. Use '-diag-disable=10441' to disable this message.
../../../../../3rd-party/openpmix/src/util/pmix_path.c(55): catastrophic error: cannot open source file "../../../../../3rd-party/openpmix/src/util/pmix_path.c"
  #    include <sys/statfs.h>
                             ^

compilation aborted for ../../../../../3rd-party/openpmix/src/util/pmix_path.c (code 4)
make[4]: *** [pmix_path.lo] Error 1

I'm not sure if this is Open MPI's fault or PMIx's fault, so I filed the issue here to investigate the Open MPI side first.

Initially reported by Christophe Peyret: https://www.mail-archive.com/users@lists.open-mpi.org/msg35246.html

jsquyres commented 11 months ago

This appears to be fixed at the tip of OpenPMIx's main and v4.2 branches, and therefore also fixed at the tip of Open MPI's main and v5.0.x` branches (and will therefore be in the forthcoming Open MPI v5.0.1). The nightly Open MPI v5.0.x snapshot tarball includes the fix -- I am unable to replicate the problem with the latest snapshot from https://www.open-mpi.org/nightly/v5.0.x/

Christophe Peyret: can you confirm that the latest v5.0.x snapshot works for you?

tof92130 commented 11 months ago

Hello,

I used openmpi-v5.0.x-202311110241-44a7845 to build open-mpi-5.0 this morning on MacOS 14.1 with OneAPI compilers and I still have error :

icc: remark #10441: The Intel(R) C++ Compiler Classic (ICC) is deprecated and will be removed from product release in the second half of 2023. Use '-diag-disable=10441' to disable this message. CC pmix_path.lo icc: remark #10441: The Intel(R) C++ Compiler Classic (ICC) is deprecated and will be removed from product release in the second half of 2023. Use '-diag-disable=10441' to disable this message. /Users/christophe/Developer/openmpi-v5.0.x-202311110241-44a7845/3rd-party/openpmix/src/util/pmix_path.c(55): catastrophic error: cannot open source file "/Users/christophe/Developer/openmpi-v5.0.x-202311110241-44a7845/3rd-party/openpmix/src/util/pmix_path.c"

include <sys/statfs.h>

                         ^

sincerelly Christophe

Le 10 nov. 2023 à 23:37, Jeff Squyres @.***> a écrit :

This seems to be the crux of the problem:

I have libevent (and hwloc) installed by Homebrew in /usr/local. Open MPI's configure therefore (correctly) decides not to build its internal libevent (and hwloc). gcc and clang and icc all find (in /usr/local/include/event.h) with no additional command line flags gcc and clang find libevent_core. (in /usr/local/lib/libevent.) with no additional command line flags But icc seems to require -L/usr/local/lib to find libevent_core.* Here's a trivial program that shows the issue -- my trivial program does include , doesn't even call any Libevent APIs, but does -levent_core just to make the linker search for it:

$ cat foo.c

include

include

int main() { printf("Hello world\n"); return 0; }

Success

$ gcc -o foo.exe foo.c -levent_core

Success

$ clang -o foo.exe foo.c -levent_core

Fail without -L/usr/local/lib

$ icc -o foo.exe foo.c -levent_core
icc: remark #10441: The Intel(R) C++ Compiler Classic (ICC) is deprecated and will be removed from product release in the second half of 2023. Use '-diag-disable=10441' to disable this message. -macosx_version_min has been renamed to -macos_version_min ld: warning: -keep_dwarf_unwind is obsolete ld: warning: ignoring duplicate libraries: '/opt/intel/oneapi/compiler/2023.2.0/mac/bin/intel64/../../compiler/lib/libsvml.a' ld: library 'event_core' not found

Success with -L/usr/local/lib

$ icc -o foo.exe foo.c -L/usr/local/lib -levent_core icc: remark #10441: The Intel(R) C++ Compiler Classic (ICC) is deprecated and will be removed from product release in the second half of 2023. Use '-diag-disable=10441' to disable this message. -macosx_version_min has been renamed to -macos_version_min ld: warning: -keep_dwarf_unwind is obsolete ld: warning: ignoring duplicate libraries: '/opt/intel/oneapi/compiler/2023.2.0/mac/bin/intel64/../../compiler/lib/libsvml.a' ld: warning: no platform load command found in '/private/var/folders/8c/wytfk7955cq9kdh1ppsrrq4w0000gp/T/iccwq1DsY.o', assuming: macOS ld: warning: no platform load command found in '/opt/intel/oneapi/compiler/2023.2.0/mac/bin/intel64/../../compiler/lib/libirc.a4', assuming: macOS ld: warning: no platform load command found in '/opt/intel/oneapi/compiler/2023.2.0/mac/bin/intel64/../../compiler/lib/libirc.a80', assuming: macOS ld: warning: no platform load command found in '/opt/intel/oneapi/compiler/2023.2.0/mac/bin/intel64/../../compiler/lib/libirc.a100', assuming: macOS ld: warning: no platform load command found in '/opt/intel/oneapi/compiler/2023.2.0/mac/bin/intel64/../../compiler/lib/libirc.a103', assuming: macOS ld: warning: no platform load command found in '/opt/intel/oneapi/compiler/2023.2.0/mac/bin/intel64/../../compiler/lib/libirc.a117', assuming: macOS ld: warning: no platform load command found in '/opt/intel/oneapi/compiler/2023.2.0/mac/bin/intel64/../../compiler/lib/libirc.a118', assuming: macOS ld: warning: no platform load command found in '/opt/intel/oneapi/compiler/2023.2.0/mac/bin/intel64/../../compiler/lib/libirc.a127', assuming: macOS This feels like a bug in the Intel OneAPI compilers to me -- it's definitely weird that it doesn't need -I/usr/local/include to find , but does need -L/usr/local/lib to find libevent_core.*.

Someone should report this issue upstream to the Intel OneAPI maintainers.

Regardless, there's a workaround for Open MPI. I don't know for sure, but I'm guessing that the original reporter was in a similar situation as me, in that they have libevent (and potentially hwloc) installed via Homebrew or MacPorts, and Open MPI's configure finds it, and therefore decides not to build its internal copies. In this situation, you can:

./configure CC=icc CXX=icpc FC=ifort LDFLAGS=-L/usr/local/lib ... Which, given that Open MPI's configure script finds the external Libevent (and potentially hwloc), will tell the Intel compiler/linker where to find libevent_core. (and libhwloc.).

If your situation is different, I'd like to hear the details.

One user noted on the mailing list that they typically build like this with the Intel ONE compilers on macOS:

./configure CC=icc CXX=icpc FC=ifort --with-hwloc=internal --with-libevent=internal --with-pmix=internal --with-prrte=internal ... Using the internal Libevent and hwloc is another way to address this issue, presuming:

You want to enable the possibility of building against an external PMIx and PRRTE You're not installing into the same directory prefix where other copies of Libevent / Hwloc exist (e.g., if you have them installed by Homebrew, bad things can happen if you specify --prefix=/usr/local) Hence, you could probably shorten the above workaround to:

./configure CC=icc CXX=icpc FC=ifort --with-hwloc=internal --with-libevent=internal ... — Reply to this email directly, view it on GitHub https://github.com/open-mpi/ompi/issues/12052#issuecomment-1806524550, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAE3IPYO5JWHVWF7SPT2JE3YD2UCXAVCNFSM6AAAAAA7AIVX46VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMBWGUZDINJVGA. You are receiving this because you are subscribed to this thread.

jsquyres commented 11 months ago

This is odd, because I see the right macro protection around #include <sys/statfs.h>.

Can you upload the complete output from running configure, and the config.log as well as the 3rd-party/openpmix/config.log files?

tof92130 commented 11 months ago

here are the log files

config.log config.log

volkerblum commented 11 months ago

I believe that the

#    include <sys/statfs.h>

error is a problem that exists between OneAPI and XCode 15.0.1 on MacOS. In another code, this issue vanished for me when upgrading to XCode 15.1 beta 2 - however, I still see headers that cannot be found.

I will try the whole process again for myself, this morning, but unfortunately even updating MacOS itself to 14.1.1 took about an hour or my time. We'll see.

volkerblum commented 11 months ago

I can confirm that the following combination builds. I have not done any tests, though.*

MacOS Sonoma 14.1.1 (23B81), on MacBook Pro 16 inch (2019), trying a build with OneAPI 2023.2.0 and Apple XCode 15.1 beta 2.

nightly tarball 
openmpi-v5.0.x-202311110241-44a7845.tar.gz

./configure CC=icc FC=ifort CXX=icpc --with-libevent=internal --with-hwloc=internal --with-pmix=internal --with-prrte=internal

*) No tests yet ... took quite a while and I now need to reconfigure / rebuild to include a --prefix in the configure, which I forgot. However, it is encouraging that the build, at least, went through without compiler errors.

Thank you!

volkerblum commented 11 months ago

Addendum. (Partly off topic, I apologize.) While I do not see the include problem for OpenMPI 5.0.x anymore (see above), I continue to see it for another code:

/.../src/common.hh(13): catastrophic error: cannot open source file "/.../src/common.hh"
  #include <cstdio>
                   ^

                  ^

compilation aborted for /.../src/common.cc (code 4)

This is code that builds fine on other platforms and I am not the author of the C++ code in question. However, there seems to be something peculiar about this combination of software not finding its include paths. The "# include <sys/statfs.h>" issue above appears to be similar. If a fix emerges, I would be interested. Thank you!

jsquyres commented 11 months ago

here are the log files

@tof92130 I think what is happening here is that you have an externally-installed PMIx and PRTE, and Open MPI's configure script is "preferring" those vs. its internal/bundled PMIx and PRTE. Hence, when you're building Open MPI against the external PMIx / PRTE, you are running into the problem.

Can you try configuring and building with ./configure --with-hwloc=internal --with-libevent=internal --with-pmix=internal --with-prrte=internal ...?

@volkerblum I can't speak for <cstdio>, but the <sys/statfs.h> thing just requires a proper configure test and macro protection in the .c file that includes <sys/statfs.h>. I haven't looked into the details of how OneAPI uses the system header files vs. its own header files, but I'm guessing that icc simply just doesn't have <sys/statfs.h>.

tof92130 commented 11 months ago

Hello,

I tried to compile with Xcode15.0 and Xcode15.1beta3 using:

$DEV_DIR/openmpi-v5.0.x-202311110241-44a7845/configure --prefix=/Users/christophe/Applications/Intel/openmpi-5.0.0 F77=ifort FC=ifort CC=icc CXX=icpc --with-libevent=internal --with-hwloc=internal --with-prrte=internal --with-pmix=internal

or

$DEV_DIR/openmpi-v5.0.x-202311110241-44a7845/configure --prefix=/Users/christophe/Applications/Intel/openmpi-5.0.0 F77=ifort FC=ifort CC=icc CXX=icpc --with-libevent=internal --with-hwloc=internal

and error message remain:

icc: remark #10441: The Intel(R) C++ Compiler Classic (ICC) is deprecated and will be removed from product release in the second half of 2023. Use '-diag-disable=10441' to disable this message.
/Users/christophe/Developer/openmpi-v5.0.x-202311110241-44a7845/3rd-party/openpmix/src/util/pmix_path.c(55): catastrophic error: cannot open source file "/Users/christophe/Developer/openmpi-v5.0.x-202311110241-44a7845/3rd-party/openpmix/src/util/pmix_path.c"
  #    include <sys/statfs.h>
                             ^

compilation aborted for /Users/christophe/Developer/openmpi-v5.0.x-202311110241-44a7845/3rd-party/openpmix/src/util/pmix_path.c (code 4)
jsquyres commented 11 months ago

@tof92130 For the all-4-listed-as-internal case, can you send:

  1. Stdout/stderr from running configure
  2. config.log from the top-level directory
  3. config.log from 3rd-party/openpmix
  4. Stdout/stderr from running "make V=1"
tof92130 commented 11 months ago

here are log files Xcode15.1Beta3

--with-libevent=internal --with-hwloc=internal --with-prrte=internal --with-pmix=internal

make V=1 doesn't create Stdout/stderr

config.log config.log

jsquyres commented 11 months ago

make V=1 doesn't create Stdout/stderr

I'm not sure I understand. I'm just asking for something like:

make V=1 |& tee make.out

Also, can you please send the stdout/stderr from running the configure with all 4 "internal" options? E.g., something like:

/Users/christophe/Developer/openmpi-v5.0.x-202311110241-44a7845/configure \
    --prefix=/Users/christophe/Applications/Intel/openmpi-5.0.0 \
    F77=ifort FC=ifort CC=icc CXX=icpc \
    --with-libevent=internal --with-hwloc=internal --with-prrte=internal --with-pmix=internal \
    |& tee configure.out
tof92130 commented 11 months ago

--with-libevent=internal --with-hwloc=internal

config.log config.log

make.txt

jsquyres commented 11 months ago

Forgive me if I'm not being clear. For the all-4-internal case, I'd like to see:

  1. Stdout/stderr from running configure:
    
    /Users/christophe/Developer/openmpi-v5.0.x-202311110241-44a7845/configure \
       --prefix=/Users/christophe/Applications/Intel/openmpi-5.0.0 \
       F77=ifort FC=ifort CC=icc CXX=icpc \
       --with-libevent=internal --with-hwloc=internal --with-prrte=internal --with-pmix=internal \
       |& tee configure.out
  2. config.log from the top-level directory (you sent this already)
  3. config.log from 3rd-party/openpmix (you sent this already)
  4. Stdout/stderr from running make V=1
    make V=1 |& tee make.out
tof92130 commented 11 months ago
openmpi-5.0.0 % /Users/christophe/Developer/openmpi-v5.0.x-202311110241-44a7845/configure \
    --prefix=/Users/christophe/Applications/Intel/openmpi-5.0.0 \
    F77=ifort FC=ifort CC=icc CXX=icpc \
    --with-libevent=internal --with-hwloc=internal --with-prrte=internal --with-pmix=internal \
    |& tee configure.txt

here ares the files:

configure.txt config.log config.log

tof92130 commented 11 months ago
 openmpi-5.0.0 % /Users/christophe/Developer/openmpi-v5.0.x-202311110241-44a7845/configure \
    --prefix=/Users/christophe/Applications/Intel/openmpi-5.0.0 \
    F77=ifort FC=ifort CC=icc CXX=icpc \
    --with-libevent=internal --with-hwloc=internal \
    |& tee configure.txt

here ares the files:

configure.txt config.log config.log

jsquyres commented 10 months ago

The configure output from both builds indicates that you're building Open MPI 5.0.0rc16, not the latest nightly snapshot (which contains the fix).

EDIT: Nope, I'm wrong. The latest snapshot is identifyng itself as v5.0.0rc16-35-g44a784555d -- my bad.

jsquyres commented 10 months ago

See my edited comment, above: I was wrong, you do have the latest snapshot; I was just confused about how it reported its version.

So in this snapshot, I just downloaded it myself and confirm that what we think is the fix is included in it. Specifically, I see:

#ifdef HAVE_SYS_STATFS_H
#    include <sys/statfs.h>
#endif

in 3rd-party/openpmix/src/util/pmix_path.c.

You didn't send the output of make V=1 -- can you please send that? I'd just like to absolutely confirm that this is the file that is failing to compile for you.

tof92130 commented 10 months ago

Hello,

after configuration with options --with-libevent=internal --with-hwloc=internal

i entered:

make V=1 |& tee make.txt

and I join you the make.txt file.

make.txt

PS: if necessary, I will send you tomorrow the file with options --with-libevent=internal --with-hwloc=internal --with-prrte=internal --with-pmix=internal

jsquyres commented 10 months ago

To simplify, I'm only interested in the all-4-internal case (because this case should be working) with the latest v5.0.x nightly snapshot tarball.

I see that you are doing a VPATH build:

checking directory of build tree... /Users/christophe/Builds/intel/openmpi-5.0.0                                              
checking directory of source tree... /Users/christophe/Developer/openmpi-v5.0.x-202311110241-44a7845                          
checking directory of prefix... /Users/christophe/Applications/Intel/openmpi-5.0.0                                            

When you built the all-4-internal case, did you completely remove the entire prefix tree, build tree, and source tree before running configure and building? If not, can you try that? I.e., please rm -rf all 3 trees, re-expand the latest v5.0.x snapshot tarball, re-run configure, and re-run make V=1. I ask because I'm wondering if some stale files are left over in one of these trees that is messing something up.

If it doesn't work, please send all eight items listed below for only the all-4-internal case:

  1. Stdout/stderr from running configure
  2. config.log from the top-level directory
  3. config.status from the top-level directory
  4. config.log from 3rd-party/openpmix
  5. config.status from 3rd-party/openpmix
  6. opal/config/opal_config.h
  7. 3rd-party/openpmix/src/include/pmix_config.h
  8. Stdout/stderr from running make V=1
jsquyres commented 10 months ago

@tof92130 Note that many in the US are out for the rest of this week (for the US Thanksgiving holiday); let us know when you get the above information, but we might not respond until next week.

tof92130 commented 10 months ago

Hello sorry for the late answer

I tried with openmpi-v5.0.x-202312130351-10c7bd6 and it is still not working.

=> as it is not possible to send .status files (Try again with GIF, JPEG, JPG, MOV, MP4, PNG, SVG, WEBM, CPUPROFILE, CSV, DMP, DOCX, FODG, FODP, FODS, FODT, GZ, JSON, JSONC, LOG, MD, ODF, ODG, ODP, ODS, ODT, PATCH, PDF, PPTX, TGZ, TXT, XLS, XLSX or ZIP.) i rename it .txt

I join you the files

config.log config.txt

config.log config.txt

opal/config/opal_config.h doesn't exist

3rd-party/openpmix/src/include/pmix_config.h is renames 3rd-party/openpmix/src/include/pmix_config.txt to join it pmix_config.txt

result of make V=1 is in make.log

make.log