Closed chenrui333 closed 3 years ago
The detection of the supported architecture features is extremely fragile, and we are at the mercy of the compiler because there is no portable way to specify exactly what architectural features we want/need. In this particular case I suspect the clang compiler used in your bottle has an incomplete set of supported features for the skylake-avx512
architecture.
We need support for 3 capabilities in order to get the AVX512 code to compile: avx512, avx512f and avx512bw. skylake-avx512
is supposed to have support for all these 3 (according to the gcc and icc manuals).
I just tested on an OSX Catalina with clang 12.0 and the -march=skylake-avx512
seems to provide everything needed to compile the code. Is the brew formula playing around with the flags ? What version of the clang compiler was used ?
This is a brew specific issue, see a previous message on our slack.
Will update shortly.
Sent from my iPod
On Dec 21, 2020, at 7:43, bosilca notifications@github.com wrote:
The detection of the supported architecture features is extremely fragile, and we are at the mercy of the compiler because there is no portable way to specify exactly what architectural features we want/need. In this particular case I suspect the clang compiler used in your bottle has an incomplete set of supported features for the skylake-avx512 architecture.
We need support for 3 capabilities in order to get the AVX512 code to compile: avx512, avx512f and avx512bw. skylake-avx512 is supposed to have support for all these 3 (according to the gcc and icc manuals).
I just tested on an OSX Catalina with clang 12.0 and the -march=skylake-avx512 seems to provide everything needed to compile the code. Is the brew formula playing around with the flags ? What version of the clang compiler was used ?
โ You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.
Please refer to Homebrew/homebrew-core#65296 for some more context, and a description of the root cause in homebrew.
TL;DR -march=skylake-avx512
is passed to gcc
at configure
time, but brew superenv
removes it at make
time. You generally do not see this failure when building from the command line (e.g. not undet brew superenv
)
A simple workaround for homebrew is to pass --enable-mca-no-build=op-avx
to the configure
command line.
A less suboptimal fix is in https://github.com/ggouaillardet/ompi/commit/7136abb22db0defd968d8ca57b3dbf0a3ee9f737, and I will open a PR shortly so it can be reviewed within the Open MPI community.
The second approach is highly specialized for brew, a little too much to my taste, I don't think we should add this into our own source code. OMPI does all the right things, checks for flags and compiler capabilities assuming that what we get during configure remains valid afterwards. superenv is clearly hindering our efforts, selectively changing flags and making the configure time detection invalid. Not finger pointing, but I think the best approach is for them to deliver a suboptimal prebuilt version, one that disables AVX support entirely.
That's fair (and I already had a lot of fun hacking this :-) ).
Let's wait for what the Homebrew team has to say about the superenv
behavior.
The Homebrew team suggested the use of ENV.refurbish_args
in the Open MPI formulae.
I am now trying this
ENV.refurbish_args
will make brew silently ignore your arch flags during configure, so depending on how the test is written, that will probably not help (i.e., configure will still think the flag is accepted).
When -march=skylake-avx512
is used, some macros are being defined: could the code be conditionally defined behind those?
#define __AVX2__ 1
#define __AVX512BW__ 1
#define __AVX512CD__ 1
#define __AVX512DQ__ 1
#define __AVX512F__ 1
#define __AVX512VL__ 1
#define __AVX__ 1
@fxcoudert thanks for the hint about the AVX512
macros.
Please help me understand better how superenv
works.
Is there an option to have superenv
behave the same (e.g. replace -march=...
with -march=nehalem
at both configure
and make
time?
What is the exact role of ENV.refurbish_args
? should it prevents -march=...
replacement at both configure
and make
time? I am really having a hard time connecting the dots ...
ENV.refurbish_args
will force all commands within install
(including configure
) to replace -march
So that will make configure
and make
run behave the same
Hello @bosilca and other OMPI developers, we at Conda-Forge also hit the same AVX512 build errors on both Linux and OS X, see the CI logs in the bot PR https://github.com/conda-forge/openmpi-feedstock/pull/71. It seems a OS X specific workaround has been suggested and I will try it out asap, but how about Linux? In general turning AVX512 on will make the package broken in certain users' environments so I'd prefer to not enable it. Thanks.
@leofang for the time being, your best bet is not to build the op/avx
component.
That can be achieved with configure --enable-mca-no-build=op-avx ...
FWIW
AVX512
instructions (assuming the compiler supports that) and the subroutines are only invoked at runtime if the CPU has support for that. That means that if you build with AVX512
support and you run on a processor that does not support AVX512
instructions, you will be just fine (read, it won't "make the package broken in certain users' environments")configure
and make
time. configure
detects that AVX512
is supported with the -march-avx512-skylake
option, but at make
time, this option is replaced with -march=nehalem
, and AVX512
instructions cannot be generated.conda-forge/openmpi-feedstock#71
have passed, so I cannot tell much about this issueHi @ggouaillardet Thanks for quick reply!
- All the checks of
conda-forge/openmpi-feedstock#71
have passed, so I cannot tell much about this issue
This is because around the time I posted, a core dev @isuruf was helping us resolve this issue by ejecting -march
and -mtune
in Conda-Forge's CFLAGS setting ๐ but it seems --enable-mca-no-build=op-avx
could be more robust and I'll give it a try. The log for the failing tests can still be accessed here: https://dev.azure.com/conda-forge/feedstock-builds/_build/results?buildId=253901&view=logs&jobId=d0d954b5-f111-5dc4-4d76-03b6c9d0cf7e&j=d0d954b5-f111-5dc4-4d76-03b6c9d0cf7e&t=841356e0-85bb-57d8-dbbc-852e683d1642
Just to be clear, --enable-mca-no-build=op-avx
does not build the op/avx
component.
In a sense, this is more robust, but keep in mind the op/avx
component contains interesting runtime optimizations you ideally do not want to miss.
As for brew, the logs show that -march=avx512-skylake
worked at configure
time, but a build error occurred at make
time. And as for brew, I highly recommend the same gcc behavior at both configure
and make
time.
@ggouaillardet we know that Homebrew's build system is a bit tricky for software that tries to autodetect a lot of featuresโฆ but it also makes it possible to compile a lot of software that is not designed for / tested on macOS.
I'm looking at the configure now for that AVX detection, and in most cases for autoconf-based software it is possible to override checks with ac_cv_
-type variables (using the caching mechanism). In the present case, I'd like to override op_axv2_support
and op_avx512_support
, but it does not look like it's possible. (configure op_axv2_support=0 op_avx512_support=0
) Am I missing something?
Just to be clear,
--enable-mca-no-build=op-avx
does not build theop/avx
component. In a sense, this is more robust, but keep in mind theop/avx
component contains interesting runtime optimizations you ideally do not want to miss.
Right, I just realized this. In @isuruf's approach it seems we are able to build the op/avx
component without hitting the avx512 issue, so it is in fact better than to not build. It's just unclear to me whether the avx512 stuff is skipped or not in the op/avx
component. Is there a way for me to check?
As for brew, the logs show that
-march=avx512-skylake
worked atconfigure
time, but a build error occurred atmake
time. And as for brew, I highly recommend the same gcc behavior at bothconfigure
andmake
time.
I think it is also the case in our failing tests. I see that the configure log says avx512
not supported without setting -march=avx512-skylake
:
--- MCA component op:avx (m4 configuration macro)
checking for MCA component op:avx compile mode... dso
checking for AVX512 support (no additional flags)... no
checking for AVX512 support (with -march=skylake-avx512)... yes
checking if _mm512_loadu_si512 generates code that can be compiled... yes
checking if _mm512_mullo_epi64 generates code that can be compiled... yes
checking for AVX2 support (no additional flags)... no
checking for AVX2 support (with -mavx2)... yes
checking if _mm256_loadu_si256 generates code that can be compiled... yes
checking for AVX support (no additional flags)... yes
checking for SSE4.1 support... no
checking for SSE3 support... yes
checking for AVX support (with -mavx)... yes
checking for SSE4.1 support... yes
checking for SSE3 support... yes
checking if MCA component op:avx can compile... yes
Your approach restricts users of your binary distribution with recent processors by limiting the compile architecture to a [very] ancient setup (Nehalem was released in 2008). At the contrary, our approach tries to extract the highest set of capabilities from the current compile setup, and delay the decision of what code path to be used until runtime (such that we can compile on a Nehalem and still run on an Icelake if the compiler tool chain supports it).
I don't want to restate the obvious, but allowing configure to run unrestricted and then setting strict restrictions during make and expecting things to "just work" is unreasonable and inconsistent. I changed to OMPI AVX* code generation to cope with some cases in #8322, but it does not cover the case where the configure decides to build the AVX component, but the flags provided to the compiler during make provide no AVX support. For this case (aka when Nehalem is the target build architecture), the solution provided by @ggouaillardet is the correct approach (aka. completely disable the AVX component using --enable-mca-no-build=op-avx)
Your approach restricts users of your binary distribution with recent processors by limiting the compile architecture to a [very] ancient setup (Nehalem was released in 2008).
I hope this refers to @ggouaillardet's solution, not ours ๐
Just to be clear,
--enable-mca-no-build=op-avx
does not build theop/avx
component. In a sense, this is more robust, but keep in mind theop/avx
component contains interesting runtime optimizations you ideally do not want to miss.Right, I just realized this. In @isuruf's approach it seems we are able to build the
op/avx
component without hitting the avx512 issue, so it is in fact better than to not build. It's just unclear to me whether the avx512 stuff is skipped or not in theop/avx
component. Is there a way for me to check?
Just a little update from our side (Conda-Forge): as @isuruf found out our default setting of -march
and -mtune
would interfere with the configure system, and our solution (removing all -march
and -mtune
from CFLAGS
before configuring) works fine on both Linux and macOS. I think this is better than not building op:avx
at all (by setting --enable-mca-no-build=op-avx
), so we'll take this approach and release the package.
As long as the compile flags are consistent between the configure and the make stages, OMPI should build the right bundle of possible architectural features.
@bosilca, it doesn't. At configure step, -march=skylake-avx512
is appended to user defined CFLAGS
, but during compilation -march=skylake-avx512
is prepended. That's why we had this issue in conda-forge. We've fixed that by removing -march=
from user defined CFLAGS
I see, if the order of the flags is the only issue then this is easy as the order of these flags can be controlled and made consistent between the configure and make steps. At least for as long as the underlying compiler uses the first encountered march, and not the most restrictive/permissive one.
I have a patch to fix the order of the flags. I'm testing it right now, it should be added to #8322 in about 10 minutes. If you can try #8322 it would be awesome.
At least for as long as the underlying compiler uses the first encountered march, and not the most restrictive/permissive one.
Compiler uses the last encountered march
.
If you can try #8322 it would be awesome.
Will do
Thanks @isuruf. With the current #8322 I can now run CFLAGS=-march=nehalem && configure ...
successfully: configure will not enable AVX512 support but we will generate and use the AVX2 and AVX execution paths. Let me know if this works for you.
@fxcoudert In current implementation, we do not cache op_avx512_support
and friends, that's why they cannot be overridden with ac_cv_op_avx512_support
and friends. That's a fair point and I will enhance Open MPI to support it (and then you can implement a better (e.g. finer grain) workaround in brew
.
A similar issue happens with EasyBuild. In my example it sets CFLAGS -O2 -ftree-vectorize -march=core-avx2 -fno-math-errno
at configure
and make
time, though by default it will use -O2 -ftree-vectorize -march=native -fno-math-errno
for GCC.
It then fails as follows with V=1
because the second -march
overrides the first one:
libtool: compile: gcc -DHAVE_CONFIG_H -I. -I../../../../opal/include -I../../../../ompi/include -I../../../../oshmem/includ
e -I../../../../opal/mca/hwloc/hwloc201/hwloc/include/private/autogen -I../../../../opal/mca/hwloc/hwloc201/hwloc/include/hw
loc/autogen -I../../../../ompi/mpiext/cuda/c -DGENERATE_SSE3_CODE -DGENERATE_SSE41_CODE -DGENERATE_AVX_CODE -DGENERATE_AVX2_
CODE -DGENERATE_AVX512_CODE -I../../../.. -I../../../../orte/include -I/cvmfs/soft.computecanada.ca/easybuild/software/2020/
avx2/Core/libfabric/1.10.1/include -I/cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Core/pmix/3.1.5/include -I/cv
mfs/soft.computecanada.ca/easybuild/software/2020/avx2/Core/ucx/1.9.0/include -I/usr/local/include -I/usr/local/include -mar
ch=skylake-avx512 -DNDEBUG -O2 -ftree-vectorize -march=core-avx2 -fno-math-errno -finline-functions -fno-strict-aliasing -MT
liblocal_ops_avx512_la-op_avx_functions.lo -MD -MP -MF .deps/liblocal_ops_avx512_la-op_avx_functions.Tpo -c op_avx_function
s.c -fPIC -DPIC -o .libs/liblocal_ops_avx512_la-op_avx_functions.o
op_avx_functions.c: In function ompi_op_avx_2buff_bxor_uint64_t_avx512:
op_avx_functions.c:208:21: warning: AVX512F vector return without AVX512F enabled changes the ABI [-Wpsabi]
208 | __m512i vecA = _mm512_loadu_si512((__m512i*)in); \
| ^~~~
op_avx_functions.c:263:5: note: in expansion of macro OP_AVX_AVX512_BIT_FUNC
263 | OP_AVX_AVX512_BIT_FUNC(name, type_size, type, op); \
| ^~~~~~~~~~~~~~~~~~~~~~
op_avx_functions.c:573:5: note: in expansion of macro OP_AVX_BIT_FUNC
573 | OP_AVX_BIT_FUNC(bxor, 64, uint64_t, xor)
| ^~~~~~~~~~~~~~~
In file included from /cvmfs/soft.computecanada.ca/easybuild/software/2020/Core/gcccore/10.2.0/lib/gcc/x86_64-pc-linux-gnu/10.2.0/include/immintrin.h:55,
from op_avx_functions.c:26:
op_avx_functions.c: In function ompi_op_avx_2buff_max_int8_t_avx512:
/cvmfs/soft.computecanada.ca/easybuild/software/2020/Core/gcccore/10.2.0/lib/gcc/x86_64-pc-linux-gnu/10.2.0/include/avx512fintrin.h:6429:1: error: inlining failed in call to always_inline _mm512_storeu_si512: target specific option mismatch
6429 | _mm512_storeu_si512 (void *__P, __m512i __A)
| ^~~~~~~~~~~~~~~~~~~
op_avx_functions.c:73:13: note: called from here
73 | _mm512_storeu_si512((__m512*)out, res); \
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
op_avx_functions.c:124:5: note: in expansion of macro OP_AVX_AVX512_FUNC
124 | OP_AVX_AVX512_FUNC(name, type_sign, type_size, type, op); \
| ^~~~~~~~~~~~~~~~~~
op_avx_functions.c:454:5: note: in expansion of macro OP_AVX_FUNC
454 | OP_AVX_FUNC(max, i, 8, int8_t, max)
| ^~~~~~~~~~~
In file included from /cvmfs/soft.computecanada.ca/easybuild/software/2020/Core/gcccore/10.2.0/lib/gcc/x86_64-pc-linux-gnu/10.2.0/include/immintrin.h:65,
from op_avx_functions.c:26:
/cvmfs/soft.computecanada.ca/easybuild/software/2020/Core/gcccore/10.2.0/lib/gcc/x86_64-pc-linux-gnu/10.2.0/include/avx512bwintrin.h:1984:1: error: inlining failed in call to always_inline _mm512_max_epi8: target specific option mismatch
1984 | _mm512_max_epi8 (__m512i __A, __m512i __B)
| ^~~~~~~~~~~~~~~
op_avx_functions.c:72:27: note: called from here
72 | __m512i res = _mm512_##op##_ep##type_sign##type_size(vecA, vecB); \
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
op_avx_functions.c:124:5: note: in expansion of macro OP_AVX_AVX512_FUNC
124 | OP_AVX_AVX512_FUNC(name, type_sign, type_size, type, op); \
| ^~~~~~~~~~~~~~~~~~
op_avx_functions.c:454:5: note: in expansion of macro OP_AVX_FUNC
454 | OP_AVX_FUNC(max, i, 8, int8_t, max)
| ^~~~~~~~~~~
In file included from /cvmfs/soft.computecanada.ca/easybuild/software/2020/Core/gcccore/10.2.0/lib/gcc/x86_64-pc-linux-gnu/10.2.0/include/immintrin.h:55,
from op_avx_functions.c:26:
/cvmfs/soft.computecanada.ca/easybuild/software/2020/Core/gcccore/10.2.0/lib/gcc/x86_64-pc-linux-gnu/10.2.0/include/avx512fintrin.h:6396:1: error: inlining failed in call to always_inline _mm512_loadu_si512: target specific option mismatch
6396 | _mm512_loadu_si512 (void const *__P)
| ^~~~~~~~~~~~~~~~~~
op_avx_functions.c:71:29: note: called from here
71 | __m512i vecB = _mm512_loadu_si512((__m512*)out); \
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
op_avx_functions.c:124:5: note: in expansion of macro OP_AVX_AVX512_FUNC
124 | OP_AVX_AVX512_FUNC(name, type_sign, type_size, type, op); \
| ^~~~~~~~~~~~~~~~~~
op_avx_functions.c:454:5: note: in expansion of macro OP_AVX_FUNC
454 | OP_AVX_FUNC(max, i, 8, int8_t, max)
| ^~~~~~~~~~~
In file included from /cvmfs/soft.computecanada.ca/easybuild/software/2020/Core/gcccore/10.2.0/lib/gcc/x86_64-pc-linux-gnu/10.2.0/include/immintrin.h:55,
from op_avx_functions.c:26:
/cvmfs/soft.computecanada.ca/easybuild/software/2020/Core/gcccore/10.2.0/lib/gcc/x86_64-pc-linux-gnu/10.2.0/include/avx512fintrin.h:6396:1: error: inlining failed in call to always_inline _mm512_loadu_si512: target specific option mismatch
6396 | _mm512_loadu_si512 (void const *__P)
| ^~~~~~~~~~~~~~~~~~
op_avx_functions.c:69:29: note: called from here
69 | __m512i vecA = _mm512_loadu_si512((__m512*)in); \
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
op_avx_functions.c:124:5: note: in expansion of macro OP_AVX_AVX512_FUNC
124 | OP_AVX_AVX512_FUNC(name, type_sign, type_size, type, op); \
| ^~~~~~~~~~~~~~~~~~
op_avx_functions.c:454:5: note: in expansion of macro OP_AVX_FUNC
454 | OP_AVX_FUNC(max, i, 8, int8_t, max)
| ^~~~~~~~~~~
In file included from /cvmfs/soft.computecanada.ca/easybuild/software/2020/Core/gcccore/10.2.0/lib/gcc/x86_64-pc-linux-gnu/10.2.0/include/immintrin.h:55,
from op_avx_functions.c:26:
/cvmfs/soft.computecanada.ca/easybuild/software/2020/Core/gcccore/10.2.0/lib/gcc/x86_64-pc-linux-gnu/10.2.0/include/avx512fintrin.h:6396:1: error: inlining failed in call to always_inline _mm512_loadu_si512: target specific option mismatch
6396 | _mm512_loadu_si512 (void const *__P)
| ^~~~~~~~~~~~~~~~~~
op_avx_functions.c:69:29: note: called from here
69 | __m512i vecA = _mm512_loadu_si512((__m512*)in); \
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
op_avx_functions.c:124:5: note: in expansion of macro OP_AVX_AVX512_FUNC
124 | OP_AVX_AVX512_FUNC(name, type_sign, type_size, type, op); \
| ^~~~~~~~~~~~~~~~~~
op_avx_functions.c:454:5: note: in expansion of macro OP_AVX_FUNC
454 | OP_AVX_FUNC(max, i, 8, int8_t, max)
| ^~~~~~~~~~~
I'm not sure if -march=skylake-avx512
is the right approach with runtime detection as the compile may generate AVX* instructions in the detection code as well, giving the possibility of random SIGILLs (if I understand this correctly, I could be wrong). Perhaps the file op_avx_functions.c
needs to be split into op_avx512_functions.c
, op_avx2_functions.c
and so on though I'd love if there were a more elegant solution. I haven't tried #8322 yet but will try.
FWIW, it would be awesome if the Brew / Conda / EasyBuild / etc. Open MPI packagers joined the https://lists.open-mpi.org/mailman/listinfo/ompi-packagers mailing list so that we could identify these kinds of issues before release. Thanks!
@jsquyres Homebrew has about a dozen active maintainers, all volunteers, for 5423 formulas. We simply can't follow development of the every software, or even test pre-releases in a systematic way.
I can confirm #8322 does the trick with EasyBuild. I now get:
checking for AVX512 support (no additional flags)... no
checking for AVX512 support (with -march=skylake-avx512)... no
instead of ... yes
for the last one, because it now uses CFLAGS="-march=skylake-avx512 $CFLAGS"
instead of CFLAGS="$CFLAGS -march=skylake-avx512"
.
Note.. I needed to use prebuildopts = "./autogen.pl --force && "
to make sure the configure scripts were regenerated.
Yes, #8322 works, but I think we want to find a way to enable avx512 for binary distributions like brew, conda, etc. Maybe replace -march=skylake-avx512
with -mavx512bw -mavx512f -mavx512vl
.
I misunderstood, there is a single source file op_avx_functions.c
that is compiled into multiple object files, so the -march=skylake-avx512
is ok, also for binary distributions. There is one advantage to -mavx512bw -mavx512f -mavx512vl
however in that it avoids a GCC bug where -march=native
overrides all other -march
switches (see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69471) which was fixed in Feb 2019 -- GCC 8 and older are affected by it.
@bartoldeman Cool! Good to know. cc: @dalcinl who had concerns on this. (Though I'd hope this is something the developers could tell us directly without us digging into the source code ๐ )
@jsquyres Thanks. While it's nice that OMPI has a dedicated mailing list for package maintainers, perhaps it's better to have conversations kept on GitHub. It's much easier to search, reference, and get interested parties involved timely. This issue is the best proof.
Doing a simple search-replace of -march=skylake-avx512
by -mavx512f -mavx512bw -mavx512vl -mavx512dq
in ompi/mca/op/avx/configure.m4
does the trick (don't forget the "dq", it's needed too).
This removes any trouble with multiple -march
flags and compiles in run-time detected avx512 support even if you compile with a lower architecture e.g. -march=haswell
. It would still be good to have the order consistent as in 1b8cea27dd but use of -march
in build systems is much more common than say, -mno-avx
in my experience.
@bartoldeman thank you very much!
I agree -mavx512f -mavx512bw -mavx512vl -mavx512dq
is superior to -march=skylake-avx512
in the context of the op/avx
component.
@leofang I hear what you're saying, but there's no way for us on Github to asynchronously notify our downstream packagers without them all paying close attention to our repo. That's what the ompi-packagers list is for: it's a (very) low-volume list that allows us to give a heads up to our downstream packagers when a) a new major series is coming, and/or b) a change is coming that affects packaging.
@leofang I hear what you're saying, but there's no way for us on Github to asynchronously notify our downstream packagers without them all paying close attention to our repo.
Hi @jsquyres Perhaps there is a way! ๐ I noticed Open MPI has a good tradition of having a checklist issue opened long before any release. I don't know about other packagers, but we at Conda-Forge can be notified on GitHub if you ping @conda-forge/openmpi. So perhaps we can try both pinging people in your checklists (when close to the release to reduce spam) vs we packagers subscribe the mailing list, and see which one works out better?
EDIT: the pinging doesn't seem to work?! Let me check with CF people.
Sorry, don't mind me. It seems to be a GitHub limitation that one can't ping a team under Org A from a different Org B.
Sorry, don't mind me. It seems to be a GitHub limitation that one can't ping a team under Org A from a different Org B.
No worries. ๐
If there are better ways than us having an ompi-packagers
list, we're open to suggestions. We do make decisions that impact downstream packagers sometimes, and we want to be able to communicate these kinds of things to you folks. We have also used that list to solicit the opinions of our downstream packagers (i.e., to influence an upcoming decision that will affect packaging).
It's somewhat of a difficult problem:
This is why we made the ompi-packagers
list. While we know it's yet-another-list, we've tried hard to make it very low volume and only send stuff that our downstream packagers really need to know. We had hoped that it would help avoid problems with the first release of new major release series. โน๏ธ
We're open to suggestions!
I agree
-mavx512f -mavx512bw -mavx512vl -mavx512dq
is superior to-march=skylake-avx512
in the context of theop/avx
component.
@bosilca @ggouaillardet Should we wait for this to be incorporated in #8322, and once it's accepted we (Conda-Forge) then apply it as a hotfix to our 4.1.0 package?
@ggouaillardet already pushed 2 or 3 patches on #8322, I think the PR now reflect as most of the discussions ongoing here. In addition, I have one pending fix plus a bunch of comments to add and the PR should be ready to go.
Merged fix in the v4.1.x branch.
Hi there, Homebrew maintainer here. I happened to chance upon this thread and recalled we had been building Open MPI with --enable-mca-no-build=op-avx
since 4.1.0. I've removed the flag now.
Thanks for your work on this and apologies for the headaches our somewhat peculiar build system caused you.
Hi there, Homebrew maintainer here. I happened to chance upon this thread and recalled we had been building Open MPI with
--enable-mca-no-build=op-avx
since 4.1.0. I've removed the flag now.
Woo hoo!
Thanks for your work on this and apologies for the headaches our somewhat peculiar build system caused you.
No worries -- it was a legit bug. Sorry for the hassle!
Thank you for taking the time to submit an issue!
Background information
What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)
open-mpi 4.1.0
Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)
Install process detailed in this formula
If you are building/installing from a git clone, please copy-n-paste the output from
git submodule status
.N/A (I am using release tarball)
Please describe the system on which you are running
Details of the problem
build failure
``` op_avx_functions.c:454:5: error: always_inline function '_mm512_loadu_si512' requires target feature 'avx512f', but would be inlined into function 'ompi_op_avx_2buff_max_int8_t_avx512' that is compiled without support for 'avx512f' OP_AVX_FUNC(max, i, 8, int8_t, max) ^ op_avx_functions.c:124:5: note: expanded from macro 'OP_AVX_FUNC' OP_AVX_AVX512_FUNC(name, type_sign, type_size, type, op); \ ^ op_avx_functions.c:69:29: note: expanded from macro 'OP_AVX_AVX512_FUNC' __m512i vecA = _mm512_loadu_si512((__m512*)in); \ ^ op_avx_functions.c:454:5: error: always_inline function '_mm512_loadu_si512' requires target feature 'avx512f', but would be inlined into function 'ompi_op_avx_2buff_max_int8_t_avx512' that is compiled without support for 'avx512f' op_avx_functions.c:124:5: note: expanded from macro 'OP_AVX_FUNC' OP_AVX_AVX512_FUNC(name, type_sign, type_size, type, op); \ ^ op_avx_functions.c:71:29: note: expanded from macro 'OP_AVX_AVX512_FUNC' __m512i vecB = _mm512_loadu_si512((__m512*)out); \ ^ op_avx_functions.c:454:5: error: always_inline function '_mm512_max_epi8' requires target feature 'avx512bw', but would be inlined into function 'ompi_op_avx_2buff_max_int8_t_avx512' that is compiled without support for 'avx512bw' op_avx_functions.c:124:5: note: expanded from macro 'OP_AVX_FUNC' OP_AVX_AVX512_FUNC(name, type_sign, type_size, type, op); \ ^ op_avx_functions.c:72:27: note: expanded from macro 'OP_AVX_AVX512_FUNC' __m512i res = _mm512_##op##_ep##type_sign##type_size(vecA, vecB); \ ^Full build log is in here, https://github.com/Homebrew/homebrew-core/runs/1579487261 relates to https://github.com/Homebrew/homebrew-core/pull/67221