Open opoplawski opened 3 years ago
@opoplawski Thanks for the report.
One option is to add a RUN_IF_ELSE()
test (just like we do for -latomic
).
That would do the trick but only if you build Open MPI on your Celeron processor.
Shall I guess you would not be happy with this, since you might want to install Open MPI from a binary package provided by Fedora, and such binaries are unlikely built on Celeron?
An other option would be to add yet an other configure
option to explicitly not test the -mcx16
flag (and hence fallback on -latomic
). That would help portability at the expense of performance.
@hjelmn can you please comment on the performance impact of using -latomic
even if -mcx16
would be an option?
The problem for the Fedora packaging is that it is built on hardware with cx16 support, but must be able to be run on hardware without it. Ideally it would detect this at run time, but I'm not sure this is possible. It seems like reverting to what I'm assuming is the old behavior of always using -latomic via a configure flag would be acceptable.
@opoplawski Fair enough, I issued #9023 (on master
) to address this issue.
From a Fedora packaging point of view, you would now configure --disable-cx16-atomics ...
to tell Open MPI not to test the -mcx16
flag.
I revamped the logic, and the new expected behavior on your Celeron is
-mcx16
: success-mcx16
generates correct code : fail (because of a SIGILL
)-latomic
: success-latomic
generates correct code : successI could not test this because I do not have access to a Celeron, and I will appreciate you double check the actual behavior on your tablet.
A similar fix will have to be applied to the internal PMIx
and the external one (I guess Fedora is using)
FWIW, is here a diff for the v4.1.x
branch (Open MPI only, PMIx is yet to be done)
If you choose to apply it to an official tarball, you will need recent autotools, and run
autogen.pl --force
in the source directory before invoking configure
diff --git a/config/opal_config_asm.m4 b/config/opal_config_asm.m4
index 5183c7e..1bee71a 100644
--- a/config/opal_config_asm.m4
+++ b/config/opal_config_asm.m4
@@ -11,8 +11,8 @@ dnl Copyright (c) 2004-2005 The Regents of the University of California.
dnl All rights reserved.
dnl Copyright (c) 2008-2018 Cisco Systems, Inc. All rights reserved.
dnl Copyright (c) 2010 Oracle and/or its affiliates. All rights reserved.
-dnl Copyright (c) 2015-2017 Research Organization for Information Science
-dnl and Technology (RIST). All rights reserved.
+dnl Copyright (c) 2015-2021 Research Organization for Information Science
+dnl and Technology (RIST). All rights reserved.
dnl Copyright (c) 2014-2017 Los Alamos National Security, LLC. All rights
dnl reserved.
dnl Copyright (c) 2017 Amazon.com, Inc. or its affiliates. All Rights
@@ -126,15 +126,38 @@ int main(int argc, char** argv)
dnl ------------------------------------------------------------------
dnl
-dnl Check to see if a specific function is linkable.
+dnl Helper to Check if a specific function is usable.
dnl
-dnl Check with:
+dnl
+dnl $1: function name to print
+dnl $2: program to test
+dnl $3: action if success
+dnl #4: action if fail
+dnl
+AC_DEFUN([_OPAL_ASM_CHECK_ATOMIC_FUNC],[
+ AC_LINK_IFELSE([$2],
+ [AC_MSG_RESULT([yes])
+ dnl make sure it works
+ AC_MSG_CHECKING([if $1() gives correct results])
+ AC_RUN_IFELSE([$2],
+ [$3
+ AC_MSG_RESULT([yes])],
+ [$4
+ AC_MSG_RESULT([no])],
+ [AC_MSG_RESULT([cannot test -- assume yes (cross compiling)])])],
+ [$4
+ AC_MSG_RESULT([no])])
+])
+
+dnl
+dnl Check to see if a specific function is usable.
+dnl
+dnl Check compilation and actually try ro run the test code
+dnl (if we're not cross-compiling) in order to verify that
+dnl it actually gives us the correct result:
dnl 1. No compiler/linker flags.
-dnl 2. CFLAGS += -mcx16
+dnl 2. CFLAGS += -mcx16 (unless --disable-cx16-atomics is used)
dnl 3. LIBS += -latomic
-dnl 4. Finally, if it links ok with any of #1, #2, or #3, actually try
-dnl to run the test code (if we're not cross-compiling) and verify
-dnl that it actually gives us the correct result.
dnl
dnl Note that we unfortunately can't use AC SEARCH_LIBS because its
dnl check incorrectly fails (because these functions are special compiler
@@ -161,47 +184,27 @@ AC_DEFUN([OPAL_ASM_CHECK_ATOMIC_FUNC],[
dnl Check with no compiler/linker flags
AC_MSG_CHECKING([for $1])
- AC_LINK_IFELSE([$2],
- [opal_asm_check_func_happy=1
- AC_MSG_RESULT([yes])],
- [opal_asm_check_func_happy=0
- AC_MSG_RESULT([no])])
-
+ _OPAL_ASM_CHECK_ATOMIC_FUNC([$1], [$2], [opal_asm_check_func_happy=1], [opal_asm_check_func_happy=0])
dnl If that didn't work, try again with CFLAGS+=mcx16
AS_IF([test $opal_asm_check_func_happy -eq 0],
- [AC_MSG_CHECKING([for $1 with -mcx16])
- CFLAGS="$CFLAGS -mcx16"
- AC_LINK_IFELSE([$2],
- [opal_asm_check_func_happy=1
- AC_MSG_RESULT([yes])],
- [opal_asm_check_func_happy=0
- CFLAGS=$opal_asm_check_func_CFLAGS_save
- AC_MSG_RESULT([no])])
- ])
+ [AC_MSG_CHECKING([for $1 with -mcx16])
+ AS_IF([test "$enable_cx16_atomics" = "no"],
+ [AC_MSG_RESULT([skipped])],
+ [CFLAGS="$CFLAGS -mcx16"
+ _OPAL_ASM_CHECK_ATOMIC_FUNC([$1], [$2],
+ [opal_asm_check_func_happy=1],
+ [opal_asm_check_func_happy=0
+ CFLAGS=$opal_asm_check_func_CFLAGS_save])])
+ ])
dnl If that didn't work, try again with LIBS+=-latomic
AS_IF([test $opal_asm_check_func_happy -eq 0],
- [AC_MSG_CHECKING([for $1 with -latomic])
- LIBS="$LIBS -latomic"
- AC_LINK_IFELSE([$2],
- [opal_asm_check_func_happy=1
- AC_MSG_RESULT([yes])],
- [opal_asm_check_func_happy=0
- LIBS=$opal_asm_check_func_LIBS_save
- AC_MSG_RESULT([no])])
- ])
-
- dnl If we have it, try it and make sure it gives a correct result.
- dnl As of Aug 2018, we know that it links but does *not* work on clang
- dnl 6 on ARM64.
- AS_IF([test $opal_asm_check_func_happy -eq 1],
- [AC_MSG_CHECKING([if $1() gives correct results])
- AC_RUN_IFELSE([$2],
- [AC_MSG_RESULT([yes])],
- [opal_asm_check_func_happy=0
- AC_MSG_RESULT([no])],
- [AC_MSG_RESULT([cannot test -- assume yes (cross compiling)])])
- ])
+ [AC_MSG_CHECKING([for $1 with -latomic])
+ LIBS="$LIBS -latomic"
+ _OPAL_ASM_CHECK_ATOMIC_FUNC([$1], [$2],
+ [opal_asm_check_func_happy=1],
+ [opal_asm_check_func_happy=0
+ LIBS=$opal_asm_check_func_LIBS_save])])
dnl If we were unsuccessful, restore CFLAGS/LIBS
AS_IF([test $opal_asm_check_func_happy -eq 0],
@@ -1025,6 +1028,10 @@ AC_DEFUN([OPAL_CONFIG_ASM],[
[AC_HELP_STRING([--enable-builtin-atomics],
[Enable use of __atomic builtin atomics (default: enabled)])])
+ AC_ARG_ENABLE([cx16-atomics],
+ [AS_HELP_STRING([--enable-cx16-atomics],
+ [Try using -mcx16 flag if needed (default: autodetect)])])
+
opal_cv_asm_builtin="BUILTIN_NO"
AS_IF([test "$opal_cv_asm_builtin" = "BUILTIN_NO" && test "$enable_builtin_atomics" != "no"],
[OPAL_CHECK_GCC_ATOMIC_BUILTINS([opal_cv_asm_builtin="BUILTIN_GCC"], [])])
Adding an option to entirely skip the mcx16 test is fine, but if you add actual runs of the generated code you will breaks cross-compilation. Another solution, which is currently used by a lot of distro builders, is to force the target CPU (march and friends) via the initial flags (compile and link), in order to prevent the compiler from generating invalid instructions for the specified architecture.
Thats rather disappointing that the distros will have to turn off this feature due to some sub-par processors. Is there no way to limit the effect to only these low-end processors?
@bosilca there is currently a test, and it is assumed to success in the case of cross compilation (well, as long as autotools see it as a cross-compilation).
I updated the PR to prepend -mcx16
to CFLAGS
(same as op/avx
) to make sure -march=...
takes precedence. Do you know which -march=...
should be used for Celeron? I tried -mcx16 -march=goldmont
on GCC 11, but there is no conflict and the code compiles.
My bad, I missed the cross-compilation argument for AC_RUN_IFELSE.
@opoplawski pinged me by email; I reported the original issue.
-march
cannot be set since that comes from the Fedora default optimization flags, %optflags a.k.a. $RPM_OPT_FLAGS. The Fedora default flags on x86_64 (where the problem occurred) is just
-O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection
That is, there is only the -mtune=generic
flag so that the compiler makes no assumptions about the processor.
However, if you turn on flags like -mcx16
that turn on use of specific instructions, then you break down the cross-platform compatibility operating system distributions require.
tldr: the only thing you need to do is not turn on any special hardware-specific features.
Background information
What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)
4.1.1
Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)
This is regarding build the Fedora openmpi package
Details of the problem
From https://bugzilla.redhat.com/show_bug.cgi?id=1965692:
I recently acquired a cheap wintel toy tablet/laptop (a Kano PC), which has a Celeron N4000 processor. I tried running HPL benchmarks on it to figure out how fast the thing is. However, xhpl_openmpi crashed with an illegal instruction. I then installed the MPICH version of HPL, and xhpl_mpich runs fine. The issue is thus in OpenMPI.
Looking at the build logs of openmpi-4.1.0-5.fc34.x86_64, I see that it has been compiled with the -mcx16 flag detected by configure. According to the GCC man page
However, because the instruction is not available on all x86_64 processors, the use of the flag should be disabled in OpenMPI.
However, I see no easy way to do this in configure, especially since I think we may need to fallback to using -latomic.