simd-everywhere / simde

Implementations of SIMD instruction sets for systems which don't natively support them.
https://simd-everywhere.github.io/blog/
MIT License
2.28k stars 237 forks source link

Build failures under Xcode 15 #1146

Closed timsutton closed 4 months ago

timsutton commented 4 months ago

👋 Hi, I work on maintaining the Homebrew core package repository. In attempting to bump SIMD Everywhere to 0.8.0, we ran into a couple basic failures during its post-build tests. This is a basic test.c file that looks like this:

#include <assert.h>
#include <simde/arm/neon.h>
#include <simde/x86/sse2.h>

int main() {
  int64_t a = 1, b = 2;
  assert(simde_vaddd_s64(a, b) == 3);
  simde__m128i z = simde_mm_setzero_si128();
  simde__m128i v = simde_mm_undefined_si128();
  v = simde_mm_xor_si128(v, v);
  assert(simde_mm_movemask_epi8(simde_mm_cmpeq_epi8(v, z)) == 0xFFFF);
  return 0;
}

we would build that sample like this, and then execute it:

cc  -v test.c -o test
./test

But a couple errors crop up, for example:

In file included from test.c:2:
In file included from /opt/homebrew/include/simde/arm/neon.h:245:
/opt/homebrew/include/simde/arm/neon/rnd32x.h:71:12: error: call to undeclared function 'vrnd32x_f64'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
    return vrnd32x_f64(a);
           ^
/opt/homebrew/include/simde/arm/neon/rnd32x.h:71:12: error: returning 'int' from a function with incompatible result type 'simde_float64x1_t' (aka 'float64x1_t')
    return vrnd32x_f64(a);
           ^~~~~~~~~~~~~~
/opt/homebrew/include/simde/arm/neon/rnd32x.h:131:12: error: call to undeclared function 'vrnd32xq_f64'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
    return vrnd32xq_f64(a);
           ^
/opt/homebrew/include/simde/arm/neon/rnd32x.h:131:12: error: returning 'int' from a function with incompatible result type 'simde_float64x2_t' (aka 'float64x2_t')
    return vrnd32xq_f64(a);
           ^~~~~~~~~~~~~~~

While it's possible to sidestep the implicit function declarion by setting -Wno-error=implicit-function-declaration, the incompatible result type error is there.

For what it's worth, this is on macOS Sonoma 14.4, Xcode 15.3, on an M3 Max laptop. Thanks for your time!

mr-c commented 4 months ago

Dear @timsutton ; thanks your your report.

According to https://developer.arm.com/architectures/instruction-sets/intrinsics/#q=vrnd32x_f64 , that function is part of A64 ; but LLVM/clang didn't add it until version 18: https://github.com/llvm/llvm-project/commit/dbeb3d029d8e3120668288a284d0babeb81545fd

I can confirm the build errors by adding the new macos arm64 GitHub runners to our CI: https://github.com/simd-everywhere/simde/actions/runs/8307298324

At https://github.com/simd-everywhere/simde/pull/1148 I'm testing a fix ; can you also confirm it on your side?

Can you carry a patch, or do you need a new release?

timsutton commented 4 months ago

@mr-c Thanks for the quick reply! Cherry-picking the code fixes in #1148 seem to work for me locally, ~I'll run it through our CI now~ it passes CI as well.

Carrying a patch is no problem. If you end up landing a single commit as a patch that makes it even more straightforward for including it as part of the formula definition, then on subsequent releases we'll remove it.

mr-c commented 4 months ago

@timsutton Are you actually running the SIMDe tests, or just compiling them? I get test failures on arm64 macos: https://github.com/simd-everywhere/simde/actions/runs/8307482025/job/22736747964?pr=1148 (looks like #1099 again)

timsutton commented 4 months ago

@mr-c I think very much likely just compiling. If you're interested to see specifically what is done for the formula in Homebrew, you can see that here: https://github.com/Porkepix/homebrew-core/blob/bump-simde-0.8.0/Formula/s/simde.rb#L33-L48

Since the compile/install isn't explicitly running meson test, unless that happened to be getting run indirectly as part of those tasks, beyond that the only test is the one that I just linked. Typically homebrew formulae don't tend to run a "full CI" bank of tests as part of their test do block, it's more just intended as a basic sanity check that the installation seems basically functional, instead relying on the upstream project's own CI to validate on whatever platforms it can or at least officially supports. Hope that helps answer your question!

mr-c commented 4 months ago

A "fun" discovery:

On the m1, appleclang has different #defines for -march=native (fewer) versus -mcpu=m1 (more, and including false-positives for __ARM_FEATURE_SM3 and others)

cc -dM -E -fopenmp-simd -march=native - < /dev/null

``` #define _LP64 1 #define __AARCH64EL__ 1 #define __AARCH64_CMODEL_SMALL__ 1 #define __AARCH64_SIMD__ 1 #define __APPLE_CC__ 6000 #define __APPLE__ 1 #define __ARM64_ARCH_8__ 1 #define __ARM_64BIT_STATE 1 #define __ARM_ACLE 200 #define __ARM_ALIGN_MAX_STACK_PWR 4 #define __ARM_ARCH 8 #define __ARM_ARCH_8_3__ 1 #define __ARM_ARCH_8_4__ 1 #define __ARM_ARCH_8_5__ 1 #define __ARM_ARCH_ISA_A64 1 #define __ARM_ARCH_PROFILE 'A' #define __ARM_FEATURE_ATOMICS 1 #define __ARM_FEATURE_CLZ 1 #define __ARM_FEATURE_COMPLEX 1 #define __ARM_FEATURE_CRC32 1 #define __ARM_FEATURE_DIRECTED_ROUNDING 1 #define __ARM_FEATURE_DIV 1 #define __ARM_FEATURE_FMA 1 #define __ARM_FEATURE_FRINT 1 #define __ARM_FEATURE_IDIV 1 #define __ARM_FEATURE_JCVT 1 #define __ARM_FEATURE_LDREX 0xF #define __ARM_FEATURE_NUMERIC_MAXMIN 1 #define __ARM_FEATURE_QRDMX 1 #define __ARM_FEATURE_UNALIGNED 1 #define __ARM_FP 0xE #define __ARM_FP16_ARGS 1 #define __ARM_FP16_FORMAT_IEEE 1 #define __ARM_NEON 1 #define __ARM_NEON_FP 0xE #define __ARM_NEON__ 1 #define __ARM_PCS_AAPCS64 1 #define __ARM_SIZEOF_MINIMAL_ENUM 4 #define __ARM_SIZEOF_WCHAR_T 4 #define __ATOMIC_ACQUIRE 2 #define __ATOMIC_ACQ_REL 4 #define __ATOMIC_CONSUME 1 #define __ATOMIC_RELAXED 0 #define __ATOMIC_RELEASE 3 #define __UINT_FAST8_MAX__ 255 #define __UINT_FAST8_TYPE__ unsigned char #define __UINT_LEAST16_FMTX__ "hX" #define __UINT_LEAST16_FMTo__ "ho" #define __UINT_LEAST16_FMTu__ "hu" #define __UINT_LEAST16_FMTx__ "hx" #define __UINT_LEAST16_MAX__ 65535 #define __UINT_LEAST16_TYPE__ unsigned short #define __UINT_LEAST32_FMTX__ "X" #define __UINT_LEAST32_FMTo__ "o" #define __UINT_LEAST32_FMTu__ "u" #define __UINT_LEAST32_FMTx__ "x" #define __UINT_LEAST32_MAX__ 4294967295U #define __UINT_LEAST32_TYPE__ unsigned int #define __UINT_LEAST64_FMTX__ "llX" #define __UINT_LEAST64_FMTo__ "llo" #define __UINT_LEAST64_FMTu__ "llu" #define __UINT_LEAST64_FMTx__ "llx" #define __UINT_LEAST64_MAX__ 18446744073709551615ULL #define __UINT_LEAST64_TYPE__ long long unsigned int #define __UINT_LEAST8_FMTX__ "hhX" #define __UINT_LEAST8_FMTo__ "hho" #define __UINT_LEAST8_FMTu__ "hhu" #define __UINT_LEAST8_FMTx__ "hhx" #define __UINT_LEAST8_MAX__ 255 #define __UINT_LEAST8_TYPE__ unsigned char #define __USER_LABEL_PREFIX__ _ #define __VERSION__ "Apple LLVM 14.0.3 (clang-1403.0.22.14.1)" #define __WCHAR_MAX__ 2147483647 #define __WCHAR_TYPE__ int #define __WCHAR_WIDTH__ 32 #define __WINT_MAX__ 2147483647 #define __WINT_TYPE__ int #define __WINT_WIDTH__ 32 #define __aarch64__ 1 #define __apple_build_version__ 14030022 #define __arm64 1 #define __arm64__ 1 #define __block __attribute__((__blocks__(byref))) #define __clang__ 1 #define __clang_literal_encoding__ "UTF-8" #define __clang_major__ 14 #define __clang_minor__ 0 #define __clang_patchlevel__ 3 #define __clang_version__ "14.0.3 (clang-1403.0.22.14.1)" #define __clang_wide_literal_encoding__ "UTF-32" #define __llvm__ 1 #define __nonnull _Nonnull #define __null_unspecified _Null_unspecified #define __nullable _Nullable #define __pic__ 2 #define __strong #define __unsafe_unretained #define __weak __attribute__((objc_gc(weak))) ```

cc -dM -E -fopenmp-simd -mcpu=apple-m1 - < /dev/null

``` #define _LP64 1 #define __AARCH64EL__ 1 #define __AARCH64_CMODEL_SMALL__ 1 #define __AARCH64_SIMD__ 1 #define __APPLE_CC__ 6000 #define __APPLE__ 1 #define __ARM64_ARCH_8__ 1 #define __ARM_64BIT_STATE 1 #define __ARM_ACLE 200 #define __ARM_ALIGN_MAX_STACK_PWR 4 #define __ARM_ARCH 8 #define __ARM_ARCH_8_3__ 1 #define __ARM_ARCH_8_4__ 1 #define __ARM_ARCH_8_5__ 1 #define __ARM_ARCH_ISA_A64 1 #define __ARM_ARCH_PROFILE 'A' #define __ARM_FEATURE_AES 1 #define __ARM_FEATURE_ATOMICS 1 #define __ARM_FEATURE_CLZ 1 #define __ARM_FEATURE_COMPLEX 1 #define __ARM_FEATURE_CRC32 1 #define __ARM_FEATURE_CRYPTO 1 #define __ARM_FEATURE_DIRECTED_ROUNDING 1 #define __ARM_FEATURE_DIV 1 #define __ARM_FEATURE_DOTPROD 1 #define __ARM_FEATURE_FMA 1 #define __ARM_FEATURE_FP16_FML 1 #define __ARM_FEATURE_FP16_SCALAR_ARITHMETIC 1 #define __ARM_FEATURE_FP16_VECTOR_ARITHMETIC 1 #define __ARM_FEATURE_FRINT 1 #define __ARM_FEATURE_IDIV 1 #define __ARM_FEATURE_JCVT 1 #define __ARM_FEATURE_LDREX 0xF #define __ARM_FEATURE_NUMERIC_MAXMIN 1 #define __ARM_FEATURE_QRDMX 1 #define __ARM_FEATURE_SHA2 1 #define __ARM_FEATURE_SHA3 1 #define __ARM_FEATURE_SHA512 1 #define __ARM_FEATURE_SM3 1 #define __ARM_FEATURE_SM4 1 #define __ARM_FEATURE_UNALIGNED 1 #define __ARM_FP 0xE #define __ARM_FP16_ARGS 1 #define __ARM_FP16_FORMAT_IEEE 1 #define __clang__ 1 #define __clang_literal_encoding__ "UTF-8" #define __clang_major__ 14 #define __clang_minor__ 0 #define __clang_patchlevel__ 3 #define __clang_version__ "14.0.3 (clang-1403.0.22.14.1)" #define __clang_wide_literal_encoding__ "UTF-32" #define __llvm__ 1 #define __nonnull _Nonnull #define __null_unspecified _Null_unspecified #define __nullable _Nullable #define __pic__ 2 #define __strong #define __unsafe_unretained #define __weak __attribute__((objc_gc(weak))) ```

The situation is improved for Apple clang version 15.0.0 (clang-1500.1.0.2.5), where the native version on picks up __ARM_FEATURE_DOTPROD, __ARM_FEATURE_FP16_{FML,SCALAR_ARITHMETIC,VECTOR_ARITHMETIC}, __ARM_FEATURE_RCPC, __ARM_FEATURE_SHA2, __ARM_FEATURE_SHA3, and __ARM_FEATURE_SHA512.

cc -dM -E -fopenmp-simd -march=native - < /dev/null

``` #define _LP64 1 #define __AARCH64EL__ 1 #define __AARCH64_CMODEL_SMALL__ 1 #define __AARCH64_SIMD__ 1 #define __APPLE_CC__ 6000 #define __APPLE__ 1 #define __ARM64_ARCH_8__ 1 #define __ARM_64BIT_STATE 1 #define __ARM_ACLE 200 #define __ARM_ALIGN_MAX_STACK_PWR 4 #define __ARM_ARCH 8 #define __ARM_ARCH_8_3__ 1 #define __ARM_ARCH_8_4__ 1 #define __ARM_ARCH_8_5__ 1 #define __ARM_ARCH_ISA_A64 1 #define __ARM_ARCH_PROFILE 'A' #define __ARM_FEATURE_ATOMICS 1 #define __ARM_FEATURE_CLZ 1 #define __ARM_FEATURE_COMPLEX 1 #define __ARM_FEATURE_CRC32 1 #define __ARM_FEATURE_DIRECTED_ROUNDING 1 #define __ARM_FEATURE_DIV 1 #define __ARM_FEATURE_DOTPROD 1 #define __ARM_FEATURE_FMA 1 #define __ARM_FEATURE_FP16_FML 1 #define __ARM_FEATURE_FP16_SCALAR_ARITHMETIC 1 #define __ARM_FEATURE_FP16_VECTOR_ARITHMETIC 1 #define __ARM_FEATURE_FRINT 1 #define __ARM_FEATURE_IDIV 1 #define __ARM_FEATURE_JCVT 1 #define __ARM_FEATURE_LDREX 0xF #define __ARM_FEATURE_NUMERIC_MAXMIN 1 #define __ARM_FEATURE_QRDMX 1 #define __ARM_FEATURE_RCPC 1 #define __ARM_FEATURE_SHA2 1 #define __ARM_FEATURE_SHA3 1 #define __ARM_FEATURE_SHA512 1 #define __ARM_FEATURE_UNALIGNED 1 #define __ARM_FP 0xE #define __ARM_FP16_ARGS 1 #define __ARM_FP16_FORMAT_IEEE 1 #define __ARM_NEON 1 #define __ARM_NEON_FP 0xE #define __ARM_NEON__ 1 #define __UINT_FAST64_FMTX__ "llX" #define __UINT_FAST64_FMTo__ "llo" #define __UINT_FAST64_FMTu__ "llu" #define __UINT_FAST64_FMTx__ "llx" #define __UINT_FAST64_MAX__ 18446744073709551615ULL #define __UINT_FAST64_TYPE__ long long unsigned int #define __UINT_FAST8_FMTX__ "hhX" #define __UINT_FAST8_FMTo__ "hho" #define __UINT_FAST8_FMTu__ "hhu" #define __UINT_FAST8_FMTx__ "hhx" #define __UINT_FAST8_MAX__ 255 #define __UINT_FAST8_TYPE__ unsigned char #define __UINT_LEAST16_FMTX__ "hX" #define __UINT_LEAST16_FMTo__ "ho" #define __UINT_LEAST16_FMTu__ "hu" #define __UINT_LEAST16_FMTx__ "hx" #define __UINT_LEAST16_MAX__ 65535 #define __UINT_LEAST16_TYPE__ unsigned short #define __UINT_LEAST32_FMTX__ "X" #define __UINT_LEAST32_FMTo__ "o" #define __UINT_LEAST32_FMTu__ "u" #define __UINT_LEAST32_FMTx__ "x" #define __UINT_LEAST32_MAX__ 4294967295U #define __UINT_LEAST32_TYPE__ unsigned int #define __UINT_LEAST64_FMTX__ "llX" #define __UINT_LEAST64_FMTo__ "llo" #define __UINT_LEAST64_FMTu__ "llu" #define __UINT_LEAST64_FMTx__ "llx" #define __UINT_LEAST64_MAX__ 18446744073709551615ULL #define __UINT_LEAST64_TYPE__ long long unsigned int #define __UINT_LEAST8_FMTX__ "hhX" #define __UINT_LEAST8_FMTo__ "hho" #define __UINT_LEAST8_FMTu__ "hhu" #define __UINT_LEAST8_FMTx__ "hhx" #define __UINT_LEAST8_MAX__ 255 #define __UINT_LEAST8_TYPE__ unsigned char #define __USER_LABEL_PREFIX__ _ #define __VERSION__ "Apple LLVM 15.0.0 (clang-1500.1.0.2.5)" #define __WCHAR_MAX__ 2147483647 #define __WCHAR_TYPE__ int #define __WCHAR_WIDTH__ 32 #define __WINT_MAX__ 2147483647 #define __WINT_TYPE__ int #define __WINT_WIDTH__ 32 #define __aarch64__ 1 #define __apple_build_version__ 15000100 #define __arm64 1 #define __arm64__ 1 #define __block __attribute__((__blocks__(byref))) #define __clang__ 1 #define __clang_literal_encoding__ "UTF-8" #define __clang_major__ 15 #define __clang_minor__ 0 #define __clang_patchlevel__ 0 #define __clang_version__ "15.0.0 (clang-1500.1.0.2.5)" #define __clang_wide_literal_encoding__ "UTF-32" #define __llvm__ 1 #define __nonnull _Nonnull #define __null_unspecified _Null_unspecified #define __nullable _Nullable #define __pic__ 2 #define __strong #define __unsafe_unretained #define __weak __attribute__((objc_gc(weak))) ```

cc -dM -E -fopenmp-simd -mcpu=apple-m1 - < /dev/null

``` #define _LP64 1 #define __AARCH64EL__ 1 #define __AARCH64_CMODEL_SMALL__ 1 #define __AARCH64_SIMD__ 1 #define __APPLE_CC__ 6000 #define __APPLE__ 1 #define __ARM64_ARCH_8__ 1 #define __ARM_64BIT_STATE 1 #define __ARM_ACLE 200 #define __ARM_ALIGN_MAX_STACK_PWR 4 #define __ARM_ARCH 8 #define __ARM_ARCH_8_3__ 1 #define __ARM_ARCH_8_4__ 1 #define __ARM_ARCH_8_5__ 1 #define __ARM_ARCH_ISA_A64 1 #define __ARM_ARCH_PROFILE 'A' #define __ARM_FEATURE_AES 1 #define __ARM_FEATURE_ATOMICS 1 #define __ARM_FEATURE_CLZ 1 #define __ARM_FEATURE_COMPLEX 1 #define __ARM_FEATURE_CRC32 1 #define __ARM_FEATURE_CRYPTO 1 #define __ARM_FEATURE_DIRECTED_ROUNDING 1 #define __ARM_FEATURE_DIV 1 #define __ARM_FEATURE_DOTPROD 1 #define __ARM_FEATURE_FMA 1 #define __ARM_FEATURE_FP16_FML 1 #define __ARM_FEATURE_FP16_SCALAR_ARITHMETIC 1 #define __ARM_FEATURE_FP16_VECTOR_ARITHMETIC 1 #define __ARM_FEATURE_FRINT 1 #define __ARM_FEATURE_IDIV 1 #define __ARM_FEATURE_JCVT 1 #define __ARM_FEATURE_LDREX 0xF #define __ARM_FEATURE_NUMERIC_MAXMIN 1 #define __ARM_FEATURE_QRDMX 1 #define __ARM_FEATURE_RCPC 1 #define __ARM_FEATURE_SHA2 1 #define __ARM_FEATURE_SHA3 1 #define __ARM_FEATURE_SHA512 1 #define __ARM_FEATURE_SM3 1 #define __ARM_FEATURE_SM4 1 #define __ARM_FEATURE_UNALIGNED 1 #define __ARM_FP 0xE #define __ARM_FP16_ARGS 1 #define __UINT_FAST32_FMTu__ "u" #define __UINT_FAST32_FMTx__ "x" #define __UINT_FAST32_MAX__ 4294967295U #define __UINT_FAST32_TYPE__ unsigned int #define __UINT_FAST64_FMTX__ "llX" #define __UINT_FAST64_FMTo__ "llo" #define __UINT_FAST64_FMTu__ "llu" #define __UINT_FAST64_FMTx__ "llx" #define __UINT_FAST64_MAX__ 18446744073709551615ULL #define __UINT_FAST64_TYPE__ long long unsigned int #define __UINT_FAST8_FMTX__ "hhX" #define __UINT_FAST8_FMTo__ "hho" #define __UINT_FAST8_FMTu__ "hhu" #define __UINT_FAST8_FMTx__ "hhx" #define __UINT_FAST8_MAX__ 255 #define __UINT_FAST8_TYPE__ unsigned char #define __UINT_LEAST16_FMTX__ "hX" #define __UINT_LEAST16_FMTo__ "ho" #define __UINT_LEAST16_FMTu__ "hu" #define __UINT_LEAST16_FMTx__ "hx" #define __UINT_LEAST16_MAX__ 65535 #define __UINT_LEAST16_TYPE__ unsigned short #define __UINT_LEAST32_FMTX__ "X" #define __UINT_LEAST32_FMTo__ "o" #define __UINT_LEAST32_FMTu__ "u" #define __UINT_LEAST32_FMTx__ "x" #define __UINT_LEAST32_MAX__ 4294967295U #define __UINT_LEAST32_TYPE__ unsigned int #define __UINT_LEAST64_FMTX__ "llX" #define __UINT_LEAST64_FMTo__ "llo" #define __UINT_LEAST64_FMTu__ "llu" #define __UINT_LEAST64_FMTx__ "llx" #define __UINT_LEAST64_MAX__ 18446744073709551615ULL #define __UINT_LEAST64_TYPE__ long long unsigned int #define __UINT_LEAST8_FMTX__ "hhX" #define __UINT_LEAST8_FMTo__ "hho" #define __UINT_LEAST8_FMTu__ "hhu" #define __UINT_LEAST8_FMTx__ "hhx" #define __UINT_LEAST8_MAX__ 255 #define __UINT_LEAST8_TYPE__ unsigned char #define __USER_LABEL_PREFIX__ _ #define __VERSION__ "Apple LLVM 15.0.0 (clang-1500.1.0.2.5)" #define __WCHAR_MAX__ 2147483647 #define __WCHAR_TYPE__ int #define __WCHAR_WIDTH__ 32 #define __WINT_MAX__ 2147483647 #define __WINT_TYPE__ int #define __WINT_WIDTH__ 32 #define __aarch64__ 1 #define __apple_build_version__ 15000100 #define __arm64 1 #define __arm64__ 1 #define __block __attribute__((__blocks__(byref))) #define __clang__ 1 #define __clang_literal_encoding__ "UTF-8" #define __clang_major__ 15 #define __clang_minor__ 0 #define __clang_patchlevel__ 0 #define __clang_version__ "15.0.0 (clang-1500.1.0.2.5)" #define __clang_wide_literal_encoding__ "UTF-32" #define __llvm__ 1 #define __nonnull _Nonnull #define __null_unspecified _Null_unspecified #define __nullable _Nullable #define __pic__ 2 #define __strong #define __unsafe_unretained #define __weak __attribute__((objc_gc(weak))) ```

mr-c commented 4 months ago

Build failures fixed in https://github.com/simd-everywhere/simde/pull/1148 ; I'll make a new issue to re-implement the FCVTZS/FCVTMS/FCVTPS/FCVTNS family intrinsics that I had to disable

timsutton commented 4 months ago

You rock! Thanks for that followup as well :)