open-mpi / ompi

Open MPI main development repository
https://www.open-mpi.org
Other
2.16k stars 859 forks source link

open-mpi 4.1.0 build failure (AVX) #8306

Closed chenrui333 closed 3 years ago

chenrui333 commented 3 years ago

Thank you for taking the time to submit an issue!

Background information

What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)

open-mpi 4.1.0

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

Install process detailed in this formula

If you are building/installing from a git clone, please copy-n-paste the output from git submodule status.

N/A (I am using release tarball)

Please describe the system on which you are running


Details of the problem

build failure ``` op_avx_functions.c:454:5: error: always_inline function '_mm512_loadu_si512' requires target feature 'avx512f', but would be inlined into function 'ompi_op_avx_2buff_max_int8_t_avx512' that is compiled without support for 'avx512f' OP_AVX_FUNC(max, i, 8, int8_t, max) ^ op_avx_functions.c:124:5: note: expanded from macro 'OP_AVX_FUNC' OP_AVX_AVX512_FUNC(name, type_sign, type_size, type, op); \ ^ op_avx_functions.c:69:29: note: expanded from macro 'OP_AVX_AVX512_FUNC' __m512i vecA = _mm512_loadu_si512((__m512*)in); \ ^ op_avx_functions.c:454:5: error: always_inline function '_mm512_loadu_si512' requires target feature 'avx512f', but would be inlined into function 'ompi_op_avx_2buff_max_int8_t_avx512' that is compiled without support for 'avx512f' op_avx_functions.c:124:5: note: expanded from macro 'OP_AVX_FUNC' OP_AVX_AVX512_FUNC(name, type_sign, type_size, type, op); \ ^ op_avx_functions.c:71:29: note: expanded from macro 'OP_AVX_AVX512_FUNC' __m512i vecB = _mm512_loadu_si512((__m512*)out); \ ^ op_avx_functions.c:454:5: error: always_inline function '_mm512_max_epi8' requires target feature 'avx512bw', but would be inlined into function 'ompi_op_avx_2buff_max_int8_t_avx512' that is compiled without support for 'avx512bw' op_avx_functions.c:124:5: note: expanded from macro 'OP_AVX_FUNC' OP_AVX_AVX512_FUNC(name, type_sign, type_size, type, op); \ ^ op_avx_functions.c:72:27: note: expanded from macro 'OP_AVX_AVX512_FUNC' __m512i res = _mm512_##op##_ep##type_sign##type_size(vecA, vecB); \ ^ :194:1: note: expanded from here _mm512_max_epi8 ^ _mm256_storeu_si256((__m256i*)out, res); \ ^ op_avx_functions.c:456:5: error: always_inline function '_mm512_loadu_si512' requires target feature 'avx512f', but would be inlined into function 'ompi_op_avx_2buff_max_int16_t_avx512' that is compiled without support for 'avx512f' OP_AVX_FUNC(max, i, 16, int16_t, max) ^ op_avx_functions.c:124:5: note: expanded from macro 'OP_AVX_FUNC' OP_AVX_AVX512_FUNC(name, type_sign, type_size, type, op); \ ^ op_avx_functions.c:69:29: note: expanded from macro 'OP_AVX_AVX512_FUNC' __m512i vecA = _mm512_loadu_si512((__m512*)in); \ ^ op_avx_functions.c:456:5: error: always_inline function '_mm512_loadu_si512' requires target feature 'avx512f', but would be inlined into function 'ompi_op_avx_2buff_max_int16_t_avx512' that is compiled without support for 'avx512f' op_avx_functions.c:124:5: note: expanded from macro 'OP_AVX_FUNC' OP_AVX_AVX512_FUNC(name, type_sign, type_size, type, op); \ ^ op_avx_functions.c:71:29: note: expanded from macro 'OP_AVX_AVX512_FUNC' __m512i vecB = _mm512_loadu_si512((__m512*)out); \ ^ op_avx_functions.c:456:5: error: always_inline function '_mm512_max_epi16' requires target feature 'avx512bw', but would be inlined into function 'ompi_op_avx_2buff_max_int16_t_avx512' that is compiled without support for 'avx512bw' op_avx_functions.c:124:5: note: expanded from macro 'OP_AVX_FUNC' OP_AVX_AVX512_FUNC(name, type_sign, type_size, type, op); \ ^ op_avx_functions.c:72:27: note: expanded from macro 'OP_AVX_AVX512_FUNC' __m512i res = _mm512_##op##_ep##type_sign##type_size(vecA, vecB); \ ^ :226:1: note: expanded from here _mm512_max_epi16 ^ fatal error: too many errors emitted, stopping now [-ferror-limit=] 20 errors generated. make[2]: *** [liblocal_ops_avx512_la-op_avx_functions.lo] Error 1 make[2]: *** Waiting for unfinished jobs.... make[1]: *** [all-recursive] Error 1 make: *** [all-recursive] Error 1 ```

Full build log is in here, https://github.com/Homebrew/homebrew-core/runs/1579487261 relates to https://github.com/Homebrew/homebrew-core/pull/67221

bosilca commented 3 years ago

The detection of the supported architecture features is extremely fragile, and we are at the mercy of the compiler because there is no portable way to specify exactly what architectural features we want/need. In this particular case I suspect the clang compiler used in your bottle has an incomplete set of supported features for the skylake-avx512 architecture.

We need support for 3 capabilities in order to get the AVX512 code to compile: avx512, avx512f and avx512bw. skylake-avx512 is supposed to have support for all these 3 (according to the gcc and icc manuals).

I just tested on an OSX Catalina with clang 12.0 and the -march=skylake-avx512 seems to provide everything needed to compile the code. Is the brew formula playing around with the flags ? What version of the clang compiler was used ?

ggouaillardet commented 3 years ago

This is a brew specific issue, see a previous message on our slack.

Will update shortly.

Sent from my iPod

On Dec 21, 2020, at 7:43, bosilca notifications@github.com wrote:

The detection of the supported architecture features is extremely fragile, and we are at the mercy of the compiler because there is no portable way to specify exactly what architectural features we want/need. In this particular case I suspect the clang compiler used in your bottle has an incomplete set of supported features for the skylake-avx512 architecture.

We need support for 3 capabilities in order to get the AVX512 code to compile: avx512, avx512f and avx512bw. skylake-avx512 is supposed to have support for all these 3 (according to the gcc and icc manuals).

I just tested on an OSX Catalina with clang 12.0 and the -march=skylake-avx512 seems to provide everything needed to compile the code. Is the brew formula playing around with the flags ? What version of the clang compiler was used ?

โ€” You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

ggouaillardet commented 3 years ago

Please refer to Homebrew/homebrew-core#65296 for some more context, and a description of the root cause in homebrew.

TL;DR -march=skylake-avx512 is passed to gcc at configure time, but brew superenv removes it at make time. You generally do not see this failure when building from the command line (e.g. not undet brew superenv)

A simple workaround for homebrew is to pass --enable-mca-no-build=op-avx to the configure command line.

A less suboptimal fix is in https://github.com/ggouaillardet/ompi/commit/7136abb22db0defd968d8ca57b3dbf0a3ee9f737, and I will open a PR shortly so it can be reviewed within the Open MPI community.

bosilca commented 3 years ago

The second approach is highly specialized for brew, a little too much to my taste, I don't think we should add this into our own source code. OMPI does all the right things, checks for flags and compiler capabilities assuming that what we get during configure remains valid afterwards. superenv is clearly hindering our efforts, selectively changing flags and making the configure time detection invalid. Not finger pointing, but I think the best approach is for them to deliver a suboptimal prebuilt version, one that disables AVX support entirely.

ggouaillardet commented 3 years ago

That's fair (and I already had a lot of fun hacking this :-) ).

Let's wait for what the Homebrew team has to say about the superenv behavior.

ggouaillardet commented 3 years ago

The Homebrew team suggested the use of ENV.refurbish_args in the Open MPI formulae.

I am now trying this

fxcoudert commented 3 years ago

ENV.refurbish_args will make brew silently ignore your arch flags during configure, so depending on how the test is written, that will probably not help (i.e., configure will still think the flag is accepted).

When -march=skylake-avx512 is used, some macros are being defined: could the code be conditionally defined behind those?

#define __AVX2__ 1
#define __AVX512BW__ 1
#define __AVX512CD__ 1
#define __AVX512DQ__ 1
#define __AVX512F__ 1
#define __AVX512VL__ 1
#define __AVX__ 1
ggouaillardet commented 3 years ago

@fxcoudert thanks for the hint about the AVX512 macros.

Please help me understand better how superenv works. Is there an option to have superenv behave the same (e.g. replace -march=... with -march=nehalem at both configure and make time? What is the exact role of ENV.refurbish_args? should it prevents -march=... replacement at both configure and make time? I am really having a hard time connecting the dots ...

fxcoudert commented 3 years ago

ENV.refurbish_args will force all commands within install (including configure) to replace -march So that will make configure and make run behave the same

leofang commented 3 years ago

Hello @bosilca and other OMPI developers, we at Conda-Forge also hit the same AVX512 build errors on both Linux and OS X, see the CI logs in the bot PR https://github.com/conda-forge/openmpi-feedstock/pull/71. It seems a OS X specific workaround has been suggested and I will try it out asap, but how about Linux? In general turning AVX512 on will make the package broken in certain users' environments so I'd prefer to not enable it. Thanks.

ggouaillardet commented 3 years ago

@leofang for the time being, your best bet is not to build the op/avx component. That can be achieved with configure --enable-mca-no-build=op-avx ...

FWIW

leofang commented 3 years ago

Hi @ggouaillardet Thanks for quick reply!

  • All the checks of conda-forge/openmpi-feedstock#71 have passed, so I cannot tell much about this issue

This is because around the time I posted, a core dev @isuruf was helping us resolve this issue by ejecting -march and -mtune in Conda-Forge's CFLAGS setting ๐Ÿ™‚ but it seems --enable-mca-no-build=op-avx could be more robust and I'll give it a try. The log for the failing tests can still be accessed here: https://dev.azure.com/conda-forge/feedstock-builds/_build/results?buildId=253901&view=logs&jobId=d0d954b5-f111-5dc4-4d76-03b6c9d0cf7e&j=d0d954b5-f111-5dc4-4d76-03b6c9d0cf7e&t=841356e0-85bb-57d8-dbbc-852e683d1642

ggouaillardet commented 3 years ago

Just to be clear, --enable-mca-no-build=op-avx does not build the op/avx component. In a sense, this is more robust, but keep in mind the op/avx component contains interesting runtime optimizations you ideally do not want to miss.

As for brew, the logs show that -march=avx512-skylake worked at configure time, but a build error occurred at make time. And as for brew, I highly recommend the same gcc behavior at both configure and make time.

fxcoudert commented 3 years ago

@ggouaillardet we know that Homebrew's build system is a bit tricky for software that tries to autodetect a lot of featuresโ€ฆ but it also makes it possible to compile a lot of software that is not designed for / tested on macOS.

I'm looking at the configure now for that AVX detection, and in most cases for autoconf-based software it is possible to override checks with ac_cv_-type variables (using the caching mechanism). In the present case, I'd like to override op_axv2_support and op_avx512_support, but it does not look like it's possible. (configure op_axv2_support=0 op_avx512_support=0) Am I missing something?

leofang commented 3 years ago

Just to be clear, --enable-mca-no-build=op-avx does not build the op/avx component. In a sense, this is more robust, but keep in mind the op/avx component contains interesting runtime optimizations you ideally do not want to miss.

Right, I just realized this. In @isuruf's approach it seems we are able to build the op/avx component without hitting the avx512 issue, so it is in fact better than to not build. It's just unclear to me whether the avx512 stuff is skipped or not in the op/avx component. Is there a way for me to check?

As for brew, the logs show that -march=avx512-skylake worked at configure time, but a build error occurred at make time. And as for brew, I highly recommend the same gcc behavior at both configure and make time.

I think it is also the case in our failing tests. I see that the configure log says avx512 not supported without setting -march=avx512-skylake:

--- MCA component op:avx (m4 configuration macro)
checking for MCA component op:avx compile mode... dso
checking for AVX512 support (no additional flags)... no
checking for AVX512 support (with -march=skylake-avx512)... yes
checking if _mm512_loadu_si512 generates code that can be compiled... yes
checking if _mm512_mullo_epi64 generates code that can be compiled... yes
checking for AVX2 support (no additional flags)... no
checking for AVX2 support (with -mavx2)... yes
checking if _mm256_loadu_si256 generates code that can be compiled... yes
checking for AVX support (no additional flags)... yes
checking for SSE4.1 support... no
checking for SSE3 support... yes
checking for AVX support (with -mavx)... yes
checking for SSE4.1 support... yes
checking for SSE3 support... yes
checking if MCA component op:avx can compile... yes
bosilca commented 3 years ago

Your approach restricts users of your binary distribution with recent processors by limiting the compile architecture to a [very] ancient setup (Nehalem was released in 2008). At the contrary, our approach tries to extract the highest set of capabilities from the current compile setup, and delay the decision of what code path to be used until runtime (such that we can compile on a Nehalem and still run on an Icelake if the compiler tool chain supports it).

I don't want to restate the obvious, but allowing configure to run unrestricted and then setting strict restrictions during make and expecting things to "just work" is unreasonable and inconsistent. I changed to OMPI AVX* code generation to cope with some cases in #8322, but it does not cover the case where the configure decides to build the AVX component, but the flags provided to the compiler during make provide no AVX support. For this case (aka when Nehalem is the target build architecture), the solution provided by @ggouaillardet is the correct approach (aka. completely disable the AVX component using --enable-mca-no-build=op-avx)

leofang commented 3 years ago

Your approach restricts users of your binary distribution with recent processors by limiting the compile architecture to a [very] ancient setup (Nehalem was released in 2008).

I hope this refers to @ggouaillardet's solution, not ours ๐Ÿ™‚

Just to be clear, --enable-mca-no-build=op-avx does not build the op/avx component. In a sense, this is more robust, but keep in mind the op/avx component contains interesting runtime optimizations you ideally do not want to miss.

Right, I just realized this. In @isuruf's approach it seems we are able to build the op/avx component without hitting the avx512 issue, so it is in fact better than to not build. It's just unclear to me whether the avx512 stuff is skipped or not in the op/avx component. Is there a way for me to check?

Just a little update from our side (Conda-Forge): as @isuruf found out our default setting of -march and -mtune would interfere with the configure system, and our solution (removing all -march and -mtune from CFLAGS before configuring) works fine on both Linux and macOS. I think this is better than not building op:avx at all (by setting --enable-mca-no-build=op-avx), so we'll take this approach and release the package.

bosilca commented 3 years ago

As long as the compile flags are consistent between the configure and the make stages, OMPI should build the right bundle of possible architectural features.

isuruf commented 3 years ago

@bosilca, it doesn't. At configure step, -march=skylake-avx512 is appended to user defined CFLAGS, but during compilation -march=skylake-avx512 is prepended. That's why we had this issue in conda-forge. We've fixed that by removing -march= from user defined CFLAGS

bosilca commented 3 years ago

I see, if the order of the flags is the only issue then this is easy as the order of these flags can be controlled and made consistent between the configure and make steps. At least for as long as the underlying compiler uses the first encountered march, and not the most restrictive/permissive one.

I have a patch to fix the order of the flags. I'm testing it right now, it should be added to #8322 in about 10 minutes. If you can try #8322 it would be awesome.

isuruf commented 3 years ago

At least for as long as the underlying compiler uses the first encountered march, and not the most restrictive/permissive one.

Compiler uses the last encountered march.

If you can try #8322 it would be awesome.

Will do

bosilca commented 3 years ago

Thanks @isuruf. With the current #8322 I can now run CFLAGS=-march=nehalem && configure ... successfully: configure will not enable AVX512 support but we will generate and use the AVX2 and AVX execution paths. Let me know if this works for you.

ggouaillardet commented 3 years ago

@fxcoudert In current implementation, we do not cache op_avx512_support and friends, that's why they cannot be overridden with ac_cv_op_avx512_support and friends. That's a fair point and I will enhance Open MPI to support it (and then you can implement a better (e.g. finer grain) workaround in brew.

bartoldeman commented 3 years ago

A similar issue happens with EasyBuild. In my example it sets CFLAGS -O2 -ftree-vectorize -march=core-avx2 -fno-math-errno at configure and make time, though by default it will use -O2 -ftree-vectorize -march=native -fno-math-errno for GCC.

It then fails as follows with V=1 because the second -march overrides the first one:

libtool: compile:  gcc -DHAVE_CONFIG_H -I. -I../../../../opal/include -I../../../../ompi/include -I../../../../oshmem/includ
e -I../../../../opal/mca/hwloc/hwloc201/hwloc/include/private/autogen -I../../../../opal/mca/hwloc/hwloc201/hwloc/include/hw
loc/autogen -I../../../../ompi/mpiext/cuda/c -DGENERATE_SSE3_CODE -DGENERATE_SSE41_CODE -DGENERATE_AVX_CODE -DGENERATE_AVX2_
CODE -DGENERATE_AVX512_CODE -I../../../.. -I../../../../orte/include -I/cvmfs/soft.computecanada.ca/easybuild/software/2020/
avx2/Core/libfabric/1.10.1/include -I/cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Core/pmix/3.1.5/include -I/cv
mfs/soft.computecanada.ca/easybuild/software/2020/avx2/Core/ucx/1.9.0/include -I/usr/local/include -I/usr/local/include -mar
ch=skylake-avx512 -DNDEBUG -O2 -ftree-vectorize -march=core-avx2 -fno-math-errno -finline-functions -fno-strict-aliasing -MT
 liblocal_ops_avx512_la-op_avx_functions.lo -MD -MP -MF .deps/liblocal_ops_avx512_la-op_avx_functions.Tpo -c op_avx_function
s.c  -fPIC -DPIC -o .libs/liblocal_ops_avx512_la-op_avx_functions.o
op_avx_functions.c: In function ompi_op_avx_2buff_bxor_uint64_t_avx512:
op_avx_functions.c:208:21: warning: AVX512F vector return without AVX512F enabled changes the ABI [-Wpsabi]
  208 |             __m512i vecA =  _mm512_loadu_si512((__m512i*)in);           \
      |                     ^~~~
op_avx_functions.c:263:5: note: in expansion of macro OP_AVX_AVX512_BIT_FUNC
  263 |     OP_AVX_AVX512_BIT_FUNC(name, type_size, type, op);                  \
      |     ^~~~~~~~~~~~~~~~~~~~~~
op_avx_functions.c:573:5: note: in expansion of macro OP_AVX_BIT_FUNC
  573 |     OP_AVX_BIT_FUNC(bxor, 64, uint64_t, xor)
      |     ^~~~~~~~~~~~~~~
In file included from /cvmfs/soft.computecanada.ca/easybuild/software/2020/Core/gcccore/10.2.0/lib/gcc/x86_64-pc-linux-gnu/10.2.0/include/immintrin.h:55,
                 from op_avx_functions.c:26:
op_avx_functions.c: In function ompi_op_avx_2buff_max_int8_t_avx512:
/cvmfs/soft.computecanada.ca/easybuild/software/2020/Core/gcccore/10.2.0/lib/gcc/x86_64-pc-linux-gnu/10.2.0/include/avx512fintrin.h:6429:1: error: inlining failed in call to always_inline _mm512_storeu_si512: target specific option mismatch
 6429 | _mm512_storeu_si512 (void *__P, __m512i __A)
      | ^~~~~~~~~~~~~~~~~~~
op_avx_functions.c:73:13: note: called from here
   73 |             _mm512_storeu_si512((__m512*)out, res);                            \
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
op_avx_functions.c:124:5: note: in expansion of macro OP_AVX_AVX512_FUNC
  124 |     OP_AVX_AVX512_FUNC(name, type_sign, type_size, type, op);                  \
      |     ^~~~~~~~~~~~~~~~~~
op_avx_functions.c:454:5: note: in expansion of macro OP_AVX_FUNC
  454 |     OP_AVX_FUNC(max, i, 8,    int8_t, max)
      |     ^~~~~~~~~~~
In file included from /cvmfs/soft.computecanada.ca/easybuild/software/2020/Core/gcccore/10.2.0/lib/gcc/x86_64-pc-linux-gnu/10.2.0/include/immintrin.h:65,
                 from op_avx_functions.c:26:
/cvmfs/soft.computecanada.ca/easybuild/software/2020/Core/gcccore/10.2.0/lib/gcc/x86_64-pc-linux-gnu/10.2.0/include/avx512bwintrin.h:1984:1: error: inlining failed in call to always_inline _mm512_max_epi8: target specific option mismatch
 1984 | _mm512_max_epi8 (__m512i __A, __m512i __B)
      | ^~~~~~~~~~~~~~~
op_avx_functions.c:72:27: note: called from here
   72 |             __m512i res = _mm512_##op##_ep##type_sign##type_size(vecA, vecB);  \
      |                           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
op_avx_functions.c:124:5: note: in expansion of macro OP_AVX_AVX512_FUNC
  124 |     OP_AVX_AVX512_FUNC(name, type_sign, type_size, type, op);                  \
      |     ^~~~~~~~~~~~~~~~~~
op_avx_functions.c:454:5: note: in expansion of macro OP_AVX_FUNC
  454 |     OP_AVX_FUNC(max, i, 8,    int8_t, max)
      |     ^~~~~~~~~~~
In file included from /cvmfs/soft.computecanada.ca/easybuild/software/2020/Core/gcccore/10.2.0/lib/gcc/x86_64-pc-linux-gnu/10.2.0/include/immintrin.h:55,
                 from op_avx_functions.c:26:
/cvmfs/soft.computecanada.ca/easybuild/software/2020/Core/gcccore/10.2.0/lib/gcc/x86_64-pc-linux-gnu/10.2.0/include/avx512fintrin.h:6396:1: error: inlining failed in call to always_inline _mm512_loadu_si512: target specific option mismatch
 6396 | _mm512_loadu_si512 (void const *__P)
      | ^~~~~~~~~~~~~~~~~~
op_avx_functions.c:71:29: note: called from here
   71 |             __m512i vecB =  _mm512_loadu_si512((__m512*)out);                  \
      |                             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
op_avx_functions.c:124:5: note: in expansion of macro OP_AVX_AVX512_FUNC
  124 |     OP_AVX_AVX512_FUNC(name, type_sign, type_size, type, op);                  \
      |     ^~~~~~~~~~~~~~~~~~
op_avx_functions.c:454:5: note: in expansion of macro OP_AVX_FUNC
  454 |     OP_AVX_FUNC(max, i, 8,    int8_t, max)
      |     ^~~~~~~~~~~
In file included from /cvmfs/soft.computecanada.ca/easybuild/software/2020/Core/gcccore/10.2.0/lib/gcc/x86_64-pc-linux-gnu/10.2.0/include/immintrin.h:55,
                 from op_avx_functions.c:26:
/cvmfs/soft.computecanada.ca/easybuild/software/2020/Core/gcccore/10.2.0/lib/gcc/x86_64-pc-linux-gnu/10.2.0/include/avx512fintrin.h:6396:1: error: inlining failed in call to always_inline _mm512_loadu_si512: target specific option mismatch
 6396 | _mm512_loadu_si512 (void const *__P)
      | ^~~~~~~~~~~~~~~~~~
op_avx_functions.c:69:29: note: called from here
   69 |             __m512i vecA =  _mm512_loadu_si512((__m512*)in);                   \
      |                             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
op_avx_functions.c:124:5: note: in expansion of macro OP_AVX_AVX512_FUNC
  124 |     OP_AVX_AVX512_FUNC(name, type_sign, type_size, type, op);                  \
      |     ^~~~~~~~~~~~~~~~~~
op_avx_functions.c:454:5: note: in expansion of macro OP_AVX_FUNC
  454 |     OP_AVX_FUNC(max, i, 8,    int8_t, max)
      |     ^~~~~~~~~~~
In file included from /cvmfs/soft.computecanada.ca/easybuild/software/2020/Core/gcccore/10.2.0/lib/gcc/x86_64-pc-linux-gnu/10.2.0/include/immintrin.h:55,
                 from op_avx_functions.c:26:
/cvmfs/soft.computecanada.ca/easybuild/software/2020/Core/gcccore/10.2.0/lib/gcc/x86_64-pc-linux-gnu/10.2.0/include/avx512fintrin.h:6396:1: error: inlining failed in call to always_inline _mm512_loadu_si512: target specific option mismatch
 6396 | _mm512_loadu_si512 (void const *__P)
      | ^~~~~~~~~~~~~~~~~~
op_avx_functions.c:69:29: note: called from here
   69 |             __m512i vecA =  _mm512_loadu_si512((__m512*)in);                   \
      |                             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
op_avx_functions.c:124:5: note: in expansion of macro OP_AVX_AVX512_FUNC
  124 |     OP_AVX_AVX512_FUNC(name, type_sign, type_size, type, op);                  \
      |     ^~~~~~~~~~~~~~~~~~
op_avx_functions.c:454:5: note: in expansion of macro OP_AVX_FUNC
  454 |     OP_AVX_FUNC(max, i, 8,    int8_t, max)
      |     ^~~~~~~~~~~

I'm not sure if -march=skylake-avx512 is the right approach with runtime detection as the compile may generate AVX* instructions in the detection code as well, giving the possibility of random SIGILLs (if I understand this correctly, I could be wrong). Perhaps the file op_avx_functions.c needs to be split into op_avx512_functions.c, op_avx2_functions.c and so on though I'd love if there were a more elegant solution. I haven't tried #8322 yet but will try.

jsquyres commented 3 years ago

FWIW, it would be awesome if the Brew / Conda / EasyBuild / etc. Open MPI packagers joined the https://lists.open-mpi.org/mailman/listinfo/ompi-packagers mailing list so that we could identify these kinds of issues before release. Thanks!

fxcoudert commented 3 years ago

@jsquyres Homebrew has about a dozen active maintainers, all volunteers, for 5423 formulas. We simply can't follow development of the every software, or even test pre-releases in a systematic way.

bartoldeman commented 3 years ago

I can confirm #8322 does the trick with EasyBuild. I now get:

checking for AVX512 support (no additional flags)... no
checking for AVX512 support (with -march=skylake-avx512)... no

instead of ... yes for the last one, because it now uses CFLAGS="-march=skylake-avx512 $CFLAGS" instead of CFLAGS="$CFLAGS -march=skylake-avx512".

Note.. I needed to use prebuildopts = "./autogen.pl --force && " to make sure the configure scripts were regenerated.

isuruf commented 3 years ago

Yes, #8322 works, but I think we want to find a way to enable avx512 for binary distributions like brew, conda, etc. Maybe replace -march=skylake-avx512 with -mavx512bw -mavx512f -mavx512vl.

bartoldeman commented 3 years ago

I misunderstood, there is a single source file op_avx_functions.c that is compiled into multiple object files, so the -march=skylake-avx512 is ok, also for binary distributions. There is one advantage to -mavx512bw -mavx512f -mavx512vl however in that it avoids a GCC bug where -march=native overrides all other -march switches (see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69471) which was fixed in Feb 2019 -- GCC 8 and older are affected by it.

leofang commented 3 years ago

@bartoldeman Cool! Good to know. cc: @dalcinl who had concerns on this. (Though I'd hope this is something the developers could tell us directly without us digging into the source code ๐Ÿ˜…)

@jsquyres Thanks. While it's nice that OMPI has a dedicated mailing list for package maintainers, perhaps it's better to have conversations kept on GitHub. It's much easier to search, reference, and get interested parties involved timely. This issue is the best proof.

bartoldeman commented 3 years ago

Doing a simple search-replace of -march=skylake-avx512 by -mavx512f -mavx512bw -mavx512vl -mavx512dq in ompi/mca/op/avx/configure.m4 does the trick (don't forget the "dq", it's needed too).

This removes any trouble with multiple -march flags and compiles in run-time detected avx512 support even if you compile with a lower architecture e.g. -march=haswell. It would still be good to have the order consistent as in 1b8cea27dd but use of -march in build systems is much more common than say, -mno-avx in my experience.

ggouaillardet commented 3 years ago

@bartoldeman thank you very much!

I agree -mavx512f -mavx512bw -mavx512vl -mavx512dq is superior to -march=skylake-avx512 in the context of the op/avx component.

jsquyres commented 3 years ago

@leofang I hear what you're saying, but there's no way for us on Github to asynchronously notify our downstream packagers without them all paying close attention to our repo. That's what the ompi-packagers list is for: it's a (very) low-volume list that allows us to give a heads up to our downstream packagers when a) a new major series is coming, and/or b) a change is coming that affects packaging.

leofang commented 3 years ago

@leofang I hear what you're saying, but there's no way for us on Github to asynchronously notify our downstream packagers without them all paying close attention to our repo.

Hi @jsquyres Perhaps there is a way! ๐Ÿ™‚ I noticed Open MPI has a good tradition of having a checklist issue opened long before any release. I don't know about other packagers, but we at Conda-Forge can be notified on GitHub if you ping @conda-forge/openmpi. So perhaps we can try both pinging people in your checklists (when close to the release to reduce spam) vs we packagers subscribe the mailing list, and see which one works out better?

EDIT: the pinging doesn't seem to work?! Let me check with CF people.

leofang commented 3 years ago

Sorry, don't mind me. It seems to be a GitHub limitation that one can't ping a team under Org A from a different Org B.

jsquyres commented 3 years ago

Sorry, don't mind me. It seems to be a GitHub limitation that one can't ping a team under Org A from a different Org B.

No worries. ๐Ÿ˜„

If there are better ways than us having an ompi-packagers list, we're open to suggestions. We do make decisions that impact downstream packagers sometimes, and we want to be able to communicate these kinds of things to you folks. We have also used that list to solicit the opinions of our downstream packagers (i.e., to influence an upcoming decision that will affect packaging).

It's somewhat of a difficult problem:

This is why we made the ompi-packagers list. While we know it's yet-another-list, we've tried hard to make it very low volume and only send stuff that our downstream packagers really need to know. We had hoped that it would help avoid problems with the first release of new major release series. โ˜น๏ธ

We're open to suggestions!

leofang commented 3 years ago

I agree -mavx512f -mavx512bw -mavx512vl -mavx512dq is superior to -march=skylake-avx512 in the context of the op/avx component.

@bosilca @ggouaillardet Should we wait for this to be incorporated in #8322, and once it's accepted we (Conda-Forge) then apply it as a hotfix to our 4.1.0 package?

bosilca commented 3 years ago

@ggouaillardet already pushed 2 or 3 patches on #8322, I think the PR now reflect as most of the discussions ongoing here. In addition, I have one pending fix plus a bunch of comments to add and the PR should be ready to go.

jsquyres commented 3 years ago

Merged fix in the v4.1.x branch.

carlocab commented 3 years ago

Hi there, Homebrew maintainer here. I happened to chance upon this thread and recalled we had been building Open MPI with --enable-mca-no-build=op-avx since 4.1.0. I've removed the flag now.

Thanks for your work on this and apologies for the headaches our somewhat peculiar build system caused you.

jsquyres commented 3 years ago

Hi there, Homebrew maintainer here. I happened to chance upon this thread and recalled we had been building Open MPI with --enable-mca-no-build=op-avx since 4.1.0. I've removed the flag now.

Woo hoo!

Thanks for your work on this and apologies for the headaches our somewhat peculiar build system caused you.

No worries -- it was a legit bug. Sorry for the hassle!