Closed fxcoudert closed 3 years ago
--disable-builtin-atomics
as suggested by @ggouaillardet does not avoid the issue
Hmmm. Let me take a look. master builds fine on the M1 but I rarely ever build releases.
@fxcoudert thanks for the report.
the logs you posted are related to PMIx
using the GCC builtin atomics.
did you use --disable-builtin-atomics
to generate them?
if so, the error might be that Open MPI does not pass --disable-builtin-atomics
to PMIx configure
(you can check that in opal/mca/pmix/pmix3x/pmix/config.status
)
@ggouaillardet We shouldn't be failing even without that option. The gcc builtins are inferior on Apple Silicon so they should really be disable on AArch64 in v4.1.0. For master C11 should be used.
I really need to refactor the atomic support. Even when using C11 I still want the LL/SC atomics to be available. The LL/SC lifo/fifo implementations are ~ 2x the speed of the CAS128 implementations (measured on Power 8). C11 and builtins do not provide direct access to them. CAS is an Intel thing.
Hmm, the v4.1.x branch builds just fine for me.
$ ../configure --prefix=/tmp/ompi --disable-mpi-fortran --disable-oshmem &> config.out
$ make -j 32 &> make.out
$ echo $?
0
$ git branch
master
* v4.1.x
$ uname -a
Darwin Mac-mini.local 20.3.0 Darwin Kernel Version 20.3.0: Thu Jan 14 14:38:22 PST 2021; root:xnu-7195.81.2~2/RELEASE_ARM64_T8101 arm64
With run configure with ./configure --prefix=/opt/homebrew/Cellar/open-mpi/4.1.0 --disable-dependency-tracking --disable-silent-rules --enable-ipv6 --enable-mca-no-build=op-avx,reachable-netlink --with-libevent=/opt/homebrew/opt/libevent --with-sge --disable-builtin-atomics
, with clang as C compiler and gfortran as Fortran compiler
@fxcoudert Odd. I will try to build with all those options but fortran. It is a cancer on MPI :) and shouldn't have an impact on building PMIx.
What I may do is update just v4.0.x and v4.1.x to never select the builtins for AArch64. master will get an update to not use CAS128.
LL/SC:
Mac-mini:class hjelmn$ ./opal_lifo -t 1
Single thread test. Time: 0 s 13621 us 13 nsec/poppush
Atomics thread finished. Time: 0 s 14375 us 14 nsec/poppush
Atomics thread finished. Time: 0 s 154525 us 154 nsec/poppush
Atomics thread finished. Time: 0 s 154661 us 154 nsec/poppush
Atomics thread finished. Time: 0 s 156505 us 156 nsec/poppush
Atomics thread finished. Time: 0 s 157013 us 157 nsec/poppush
Atomics thread finished. Time: 0 s 157493 us 157 nsec/poppush
Atomics thread finished. Time: 0 s 158275 us 158 nsec/poppush
Atomics thread finished. Time: 0 s 158647 us 158 nsec/poppush
Atomics thread finished. Time: 0 s 158973 us 158 nsec/poppush
All threads finished. Thread count: 8 Time: 0 s 159023 us 19 nsec/poppush
SUPPORT: OMPI Test Passed: opal_lifo_t: (7 tests)
CAS128:
Mac-mini:class hjelmn$ ./opal_lifo -t 1
Single thread test. Time: 0 s 25688 us 25 nsec/poppush
Atomics thread finished. Time: 0 s 29322 us 29 nsec/poppush
Atomics thread finished. Time: 4 s 57595 us 4057 nsec/poppush
Atomics thread finished. Time: 4 s 151568 us 4151 nsec/poppush
Atomics thread finished. Time: 4 s 162332 us 4162 nsec/poppush
Atomics thread finished. Time: 4 s 173651 us 4173 nsec/poppush
Atomics thread finished. Time: 4 s 176088 us 4176 nsec/poppush
Atomics thread finished. Time: 4 s 178025 us 4178 nsec/poppush
Atomics thread finished. Time: 4 s 178713 us 4178 nsec/poppush
Atomics thread finished. Time: 4 s 178760 us 4178 nsec/poppush
All threads finished. Thread count: 8 Time: 4 s 178830 us 522 nsec/poppush
SUPPORT: OMPI Test Passed: opal_lifo_t: (7 tests)
Not even a contest.
Simlarly bad with opal_fifo:
LL/SC
Mac-mini:class hjelmn$ ./opal_fifo
Single thread test. Time: 0 s 7620 us 7 nsec/poppush
Atomics thread finished. Time: 0 s 7918 us 7 nsec/poppush
Atomics thread finished. Time: 0 s 76081 us 76 nsec/poppush
Atomics thread finished. Time: 0 s 79458 us 79 nsec/poppush
Atomics thread finished. Time: 0 s 84994 us 84 nsec/poppush
Atomics thread finished. Time: 0 s 90103 us 90 nsec/poppush
Atomics thread finished. Time: 0 s 90403 us 90 nsec/poppush
Atomics thread finished. Time: 0 s 91280 us 91 nsec/poppush
Atomics thread finished. Time: 0 s 92466 us 92 nsec/poppush
Atomics thread finished. Time: 0 s 93835 us 93 nsec/poppush
All threads finished. Thread count: 8 Time: 0 s 93916 us 11 nsec/poppush
Exhaustive atomics thread finished. Popped 821530 items. Time: 0 s 107912 us 131 nsec/poppush
Exhaustive atomics thread finished. Popped 810445 items. Time: 0 s 114695 us 141 nsec/poppush
Exhaustive atomics thread finished. Popped 806449 items. Time: 0 s 116241 us 144 nsec/poppush
Exhaustive atomics thread finished. Popped 813960 items. Time: 0 s 117182 us 143 nsec/poppush
Exhaustive atomics thread finished. Popped 825230 items. Time: 0 s 118810 us 143 nsec/poppush
Exhaustive atomics thread finished. Popped 826685 items. Time: 0 s 119486 us 144 nsec/poppush
Exhaustive atomics thread finished. Popped 828373 items. Time: 0 s 120327 us 145 nsec/poppush
Exhaustive atomics thread finished. Popped 830266 items. Time: 0 s 121114 us 145 nsec/poppush
All threads finished. Thread count: 8 Time: 0 s 121186 us 15 nsec/poppush
SUPPORT: OMPI Test Passed: opal_fifo_t: (8 tests)
CAS128:
Mac-mini:class hjelmn$ ./opal_fifo
Single thread test. Time: 0 s 7611 us 7 nsec/poppush
Atomics thread finished. Time: 0 s 19256 us 19 nsec/poppush
Atomics thread finished. Time: 2 s 555095 us 2555 nsec/poppush
Atomics thread finished. Time: 2 s 562521 us 2562 nsec/poppush
Atomics thread finished. Time: 2 s 570284 us 2570 nsec/poppush
Atomics thread finished. Time: 2 s 570760 us 2570 nsec/poppush
Atomics thread finished. Time: 2 s 571438 us 2571 nsec/poppush
Atomics thread finished. Time: 2 s 573642 us 2573 nsec/poppush
Atomics thread finished. Time: 2 s 575019 us 2575 nsec/poppush
Atomics thread finished. Time: 2 s 575161 us 2575 nsec/poppush
All threads finished. Thread count: 8 Time: 2 s 575231 us 321 nsec/poppush
Exhaustive atomics thread finished. Popped 639525 items. Time: 1 s 828167 us 2858 nsec/poppush
Exhaustive atomics thread finished. Popped 642578 items. Time: 1 s 840312 us 2863 nsec/poppush
Exhaustive atomics thread finished. Popped 641617 items. Time: 1 s 846852 us 2878 nsec/poppush
Exhaustive atomics thread finished. Popped 639283 items. Time: 1 s 849705 us 2893 nsec/poppush
Exhaustive atomics thread finished. Popped 646423 items. Time: 1 s 851183 us 2863 nsec/poppush
Exhaustive atomics thread finished. Popped 645146 items. Time: 1 s 851750 us 2870 nsec/poppush
Exhaustive atomics thread finished. Popped 645428 items. Time: 1 s 852076 us 2869 nsec/poppush
Exhaustive atomics thread finished. Popped 648267 items. Time: 1 s 852240 us 2857 nsec/poppush
All threads finished. Thread count: 8 Time: 1 s 852359 us 231 nsec/poppush
SUPPORT: OMPI Test Passed: opal_fifo_t: (8 tests)
I've uploaded our full build log at https://gist.github.com/fxcoudert/0710566fc631546b7a5ad496dabcb747 so you can check what is happening.
One weird thing is checking for builtin atomics... BUILTIN_GCC
because we're using clang as C compiler.
@fxcoudert That is because clang implements the gcc builtin atomics (__atomic_*
). They are now used in Open MPI over the older Intel __sync_*
atomics. I think we defaulted to the builtins for v4.x. This appears to have been a mistake for AArch64 as the performance is definitely worse.
@hjelmn
from the logs posted by @fxcoudert I noted:
checking for assembly architecture... UNSUPPORTED
I quickly checked config/opal_config_asm.m4
, and indeed, we do not support M1:
checking host system type... arm-apple-darwin20.2.0
right after,
checking for builtin atomics... BUILTIN_SYNC
so we could have two issues in Open MPI:
host=arm-apple-darwin20.2.0
is not supported--disable-builtin-atomics
was passed on the configure
command line
(note this is an observation from the logs, and I did not dig into the sources to confirm that).The triplet for that arch should not be arm-apple-darwin20.2.0
but aarch64-apple-darwin20.2.0
(https://github.com/gcc-mirror/gcc/blob/5a36cae275ad84cc7e623f2f5829bdad767e3f6a/config.guess#L1345)
Therefore config.{guess,sub}
need to be updated: https://www.gnu.org/software/gettext/manual/html_node/config_002eguess.html
@fxcoudert thanks for the pointer!
@jsquyres any advice on how we should handle that?
my best bet is we should patch config.{guess, sub}
the same way we patch configure
to correctly handle third party dependencies.
Just to be clear -- are we saying that the upstream config.sub
/ config.guess
files include the now-correct notation aarch64-apple-darwin20.2.0
?
If so, we should probably stash copies of them in our git repo and just cp
them to the appropriate places during autogen.pl
. We used to do something like this (we would wget
the most recent config.*
files during autogen
, but that's not really good for repeatability -- stashing known-good versions in git is probably a better scheme).
That being said, we should probably only conditionally cp
/ replace the config.*
files that autoconf
and friends install: i.e., do a version check of what we have in git vs. the what is installed by autoconf
and friends, and use whichever one is newer.
I don't think we should include config.guess and config.sub in git. One day, autoconf's will be newer and then we'll have a real problem. There is a timestamp in the files, so we may be able to cover that. But it still seems a little awkward.
We used to grab the latest config.guess/sub as part of building a tarball, although it looks like we no longer do. That seems much better than trying to cover this for all use cases.
I did verify that the config.guess we ship with OMPI tarballs (which is the one included with Autoconf 2.69) returns arm-apple-darwin20.2.0
. The latest config.guess in Savannah returns aarch64-apple-darwin20.2.0
. So it looks like we do need to pull config.guess/config.sub, at least when building tarballs.
As noted on the Jan 26: we will also need to apply (at least the config.*
files) in PMIx and PRRTE.
@bwbarrett and I talked offline. I'll go make a PR to do what was described above: stash known good copies of config.*
in Open MPI's git repo, and during autogen
, do the version compare, and if the stashed versions are newer, copy those in over what autoconf
installed.
See #8417 for autogen.pl
updates to use known-good config.guess
and config.sub
.
GNU's own documentation recommends using config.{guess,sub}
from their own repo, rather than rely on autoconf versions. https://www.gnu.org/software/gettext/manual/html_node/config_002eguess.html
@fxcoudert PR #8417 includes cached copies of config.guess
and config.sub
from Savannah from today.
We don't want to just arbitrarily grab those files from Savannah when building a tarball for a few reasons:
Hence, it seems safer to just cache known-good versions of these files in the Open MPI repo, and document them as so. If we ever need to update these files, no problem -- we can re-pull from Savannah.
Does that address your concern?
@fxcoudert This issue auto-closed, sorry about that. The v4.1.x version of the fix is in #8421.
I don't know if you want to just pull the patch and apply that; we can (and probably will) roll an RC soon, but I think you said that you don't generally test upstream betas. FWIW: we have just one more AVX blocker issue before v4.1.1 (it compiles and runs properly now, but at least in some cases there's a performance degradation that we're working to understand).
Background information
What version of Open MPI are you using?
4.1.0 from official sources
Please describe the system on which you are running
Details of the problem
https://github.com/Homebrew/homebrew-core/pull/67367#issuecomment-753315171 Compiling open-mpi 4.1.0 on Apple Silicon (aarch64-appel-darwin20) fails with build errors: