xtensor-stack / xsimd

C++ wrappers for SIMD intrinsics and parallelized, optimized mathematical functions (SSE, AVX, AVX512, NEON, SVE))
https://xsimd.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
2.15k stars 253 forks source link

Arch dispatch does not compile #1039

Closed Martmists-GH closed 4 weeks ago

Martmists-GH commented 1 month ago

Code:

#if defined(__x86_64__)
#define MAKE_SIMD(ret, name, ...) struct name {                                                                                \
    template <class Arch>                                                                                                      \
    ret operator()(Arch, __VA_ARGS__);                                                                                         \
};                                                                                                                             \
static auto name##_dispatcher = xsimd::dispatch<xsimd::all_x86_architectures>(name{});                                                            \
extern template ret name::operator()<xsimd::avx512vnni<xsimd::avx512vbmi>>(xsimd::avx512vnni<xsimd::avx512vbmi>, __VA_ARGS__); \
extern template ret name::operator()<xsimd::avx512vbmi>(xsimd::avx512vbmi, __VA_ARGS__);                                       \
extern template ret name::operator()<xsimd::avx512ifma>(xsimd::avx512ifma, __VA_ARGS__);                                       \
extern template ret name::operator()<xsimd::avx512pf>(xsimd::avx512pf, __VA_ARGS__);                                           \
extern template ret name::operator()<xsimd::avx512vnni<xsimd::avx512bw>>(xsimd::avx512vnni<xsimd::avx512bw>, __VA_ARGS__);     \
extern template ret name::operator()<xsimd::avx512bw>(xsimd::avx512bw, __VA_ARGS__);                                           \
extern template ret name::operator()<xsimd::avx512er>(xsimd::avx512er, __VA_ARGS__);                                           \
extern template ret name::operator()<xsimd::avx512dq>(xsimd::avx512dq, __VA_ARGS__);                                           \
extern template ret name::operator()<xsimd::avx512cd>(xsimd::avx512cd, __VA_ARGS__);                                           \
extern template ret name::operator()<xsimd::avx512f>(xsimd::avx512f, __VA_ARGS__);                                             \
extern template ret name::operator()<xsimd::avxvnni>(xsimd::avxvnni, __VA_ARGS__);                                             \
extern template ret name::operator()<xsimd::fma3<xsimd::avx2>>(xsimd::fma3<xsimd::avx2>, __VA_ARGS__);                         \
extern template ret name::operator()<xsimd::avx2>(xsimd::avx2, __VA_ARGS__);                                                   \
extern template ret name::operator()<xsimd::fma3<xsimd::avx>>(xsimd::fma3<xsimd::avx>, __VA_ARGS__);                           \
extern template ret name::operator()<xsimd::avx>(xsimd::avx, __VA_ARGS__);                                                     \
extern template ret name::operator()<xsimd::fma4>(xsimd::fma4, __VA_ARGS__);                                                   \
extern template ret name::operator()<xsimd::fma3<xsimd::sse4_2>>(xsimd::fma3<xsimd::sse4_2>, __VA_ARGS__);                     \
extern template ret name::operator()<xsimd::sse4_2>(xsimd::sse4_2, __VA_ARGS__);                                               \
extern template ret name::operator()<xsimd::sse4_1>(xsimd::sse4_1, __VA_ARGS__);                                               \
extern template ret name::operator()<xsimd::ssse3>(xsimd::ssse3, __VA_ARGS__);                                                 \
extern template ret name::operator()<xsimd::sse3>(xsimd::sse3, __VA_ARGS__);                                                   \
extern template ret name::operator()<xsimd::sse2>(xsimd::sse2, __VA_ARGS__);                                                   \
template <class Arch>                                                                                                          \
ret name::operator()(Arch, __VA_ARGS__)
#elif defined(__aarch64__)
// <snip>
#endif

MAKE_SIMD(void, _vec_add_scalar, double* a, double b, int n) {
    using batch = xsimd::batch<double, Arch>;

    std::size_t size = n - n % batch::size;
    auto vb = batch(b);

    for (std::size_t i = 0; i < size; i += batch::size) {
        auto va = batch::load_unaligned(&a[i]);
        auto res = va + vb;
        xsimd::store_unaligned(&a[i], res);
    }

    for (std::size_t i = size; i < n; ++i) {
        a[i] = a[i] + b;
    }
}

The following file fails to compile with -mavx512bw, and similar errors are generated for avx512dq, avx512er, avx512ifma, avx512pf and avx512vbmi.

template void _vec_add_scalar::operator()<xsimd::avx512bw>(xsimd::avx512bw, double *, double, int);
Error(s)
In file included from /home/mart/git/kotlin/kt-ndarray-simd/src/lib/arch/avx512bw.cpp:1:
/home/mart/git/kotlin/kt-ndarray-simd/src/lib/arch/../cpp/arithmetic_priv.h:8:30: warning: remainder by zero is undefined [-Wdivision-by-zero]
    std::size_t size = n - n % batch::size;
                             ^ ~~~~~~~~~~~
/home/mart/git/kotlin/kt-ndarray-simd/src/lib/arch/avx512bw.cpp:3:32: note: in instantiation of function template specialization '_vec_add_scalar::operator()' requested here
template void _vec_add_scalar::operator()(xsimd::avx512bw, double *, double, int);
                               ^
In file included from /home/mart/git/kotlin/kt-ndarray-simd/src/lib/arch/avx512bw.cpp:1:
In file included from /home/mart/git/kotlin/kt-ndarray-simd/src/lib/arch/../cpp/arithmetic_priv.h:3:
In file included from /home/mart/git/kotlin/kt-ndarray-simd/src/lib/arch/../cpp/common.h:3:
In file included from /home/mart/git/kotlin/kt-ndarray-simd/xsimd/include/xsimd/xsimd.hpp:62:
In file included from /home/mart/git/kotlin/kt-ndarray-simd/xsimd/include/xsimd/config/../types/../types/xsimd_batch.hpp:492:
In file included from /home/mart/git/kotlin/kt-ndarray-simd/xsimd/include/xsimd/config/../types/../arch/xsimd_isa.hpp:72:
/home/mart/git/kotlin/kt-ndarray-simd/xsimd/include/xsimd/arch/./xsimd_avx512f.hpp:649:20: error: no viable conversion from returned value of type '__m512d' (vector of 8 'double' values) to function return type 'batch'
            return _mm512_set1_pd(val);
                   ^~~~~~~~~~~~~~~~~~~
/home/mart/git/kotlin/kt-ndarray-simd/xsimd/include/xsimd/config/../types/../types/xsimd_batch.hpp:504:46: note: in instantiation of function template specialization 'xsimd::kernel::broadcast' requested here
        : types::simd_register(kernel::broadcast(val, A {}))
                                             ^
/home/mart/git/kotlin/kt-ndarray-simd/src/lib/arch/../cpp/arithmetic_priv.h:9:15: note: in instantiation of member function 'xsimd::batch::batch' requested here
    auto vb = batch(b);
              ^
/home/mart/git/kotlin/kt-ndarray-simd/xsimd/include/xsimd/config/../types/../types/xsimd_batch.hpp:113:11: note: candidate constructor (the implicit copy constructor) not viable: no known conversion from '__m512d' (vector of 8 'double' values) to 'const xsimd::batch &' for 1st argument
    class batch : public types::simd_register, public types::integral_only_operators
          ^
/home/mart/git/kotlin/kt-ndarray-simd/xsimd/include/xsimd/config/../types/../types/xsimd_batch.hpp:113:11: note: candidate constructor (the implicit move constructor) not viable: no known conversion from '__m512d' (vector of 8 'double' values) to 'xsimd::batch &&' for 1st argument
/home/mart/git/kotlin/kt-ndarray-simd/xsimd/include/xsimd/config/../types/../types/xsimd_batch.hpp:127:22: note: candidate constructor not viable: no known conversion from '__m512d' (vector of 8 'double' values) to 'double' for 1st argument
        XSIMD_INLINE batch(T val) noexcept;
                     ^
/home/mart/git/kotlin/kt-ndarray-simd/xsimd/include/xsimd/config/../types/../types/xsimd_batch.hpp:131:22: note: candidate constructor not viable: no known conversion from '__m512d' (vector of 8 'double' values) to 'xsimd::batch::register_type' (aka 'xsimd::types::simd_register::register_type') for 1st argument
        XSIMD_INLINE batch(register_type reg) noexcept;
                     ^
/home/mart/git/kotlin/kt-ndarray-simd/xsimd/include/xsimd/config/../types/../types/xsimd_batch.hpp:130:31: note: explicit constructor is not a candidate
        XSIMD_INLINE explicit batch(batch_bool_type const& b) noexcept;
                              ^
In file included from /home/mart/git/kotlin/kt-ndarray-simd/src/lib/arch/avx512bw.cpp:1:
In file included from /home/mart/git/kotlin/kt-ndarray-simd/src/lib/arch/../cpp/arithmetic_priv.h:3:
In file included from /home/mart/git/kotlin/kt-ndarray-simd/src/lib/arch/../cpp/common.h:3:
In file included from /home/mart/git/kotlin/kt-ndarray-simd/xsimd/include/xsimd/xsimd.hpp:62:
In file included from /home/mart/git/kotlin/kt-ndarray-simd/xsimd/include/xsimd/config/../types/../types/xsimd_batch.hpp:494:
/home/mart/git/kotlin/kt-ndarray-simd/xsimd/include/xsimd/config/../types/../types/xsimd_traits.hpp:71:13: error: static_assert failed due to requirement '!avx512bw::supported() || xsimd::has_simd_register::value' "usage of batch type with unsupported type"
            static_assert(!A::supported() || xsimd::has_simd_register::value,
            ^             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/mart/git/kotlin/kt-ndarray-simd/xsimd/include/xsimd/config/../types/../types/xsimd_traits.hpp:91:19: note: in instantiation of template class 'xsimd::detail::static_check_supported_config_emitter' requested here
            (void)static_check_supported_config_emitter();
                  ^
/home/mart/git/kotlin/kt-ndarray-simd/xsimd/include/xsimd/config/../types/../types/xsimd_batch.hpp:506:17: note: in instantiation of function template specialization 'xsimd::detail::static_check_supported_config' requested here
        detail::static_check_supported_config();
                ^
/home/mart/git/kotlin/kt-ndarray-simd/src/lib/arch/../cpp/arithmetic_priv.h:9:15: note: in instantiation of member function 'xsimd::batch::batch' requested here
    auto vb = batch(b);
              ^
In file included from /home/mart/git/kotlin/kt-ndarray-simd/src/lib/arch/avx512bw.cpp:1:
In file included from /home/mart/git/kotlin/kt-ndarray-simd/src/lib/arch/../cpp/arithmetic_priv.h:3:
In file included from /home/mart/git/kotlin/kt-ndarray-simd/src/lib/arch/../cpp/common.h:3:
In file included from /home/mart/git/kotlin/kt-ndarray-simd/xsimd/include/xsimd/xsimd.hpp:62:
/home/mart/git/kotlin/kt-ndarray-simd/xsimd/include/xsimd/config/../types/../types/xsimd_batch.hpp:629:9: error: no matching function for call to 'static_check_supported_config'
        detail::static_check_supported_config();
        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/mart/git/kotlin/kt-ndarray-simd/src/lib/arch/../cpp/arithmetic_priv.h:12:26: note: in instantiation of function template specialization 'xsimd::batch::load_unaligned' requested here
        auto va = batch::load_unaligned(&a[i]);
                         ^
/home/mart/git/kotlin/kt-ndarray-simd/xsimd/include/xsimd/config/../types/../types/xsimd_traits.hpp:89:27: note: candidate template ignored: substitution failure [with T = double, A = xsimd::avx512bw]
        XSIMD_INLINE void static_check_supported_config()
                          ^
In file included from /home/mart/git/kotlin/kt-ndarray-simd/src/lib/arch/avx512bw.cpp:1:
In file included from /home/mart/git/kotlin/kt-ndarray-simd/src/lib/arch/../cpp/arithmetic_priv.h:3:
In file included from /home/mart/git/kotlin/kt-ndarray-simd/src/lib/arch/../cpp/common.h:3:
In file included from /home/mart/git/kotlin/kt-ndarray-simd/xsimd/include/xsimd/xsimd.hpp:62:
In file included from /home/mart/git/kotlin/kt-ndarray-simd/xsimd/include/xsimd/config/../types/../types/xsimd_batch.hpp:492:
In file included from /home/mart/git/kotlin/kt-ndarray-simd/xsimd/include/xsimd/config/../types/../arch/xsimd_isa.hpp:72:
/home/mart/git/kotlin/kt-ndarray-simd/xsimd/include/xsimd/arch/./xsimd_avx512f.hpp:1169:20: error: no viable conversion from returned value of type '__m512d' (vector of 8 'double' values) to function return type 'batch'
            return _mm512_loadu_pd(mem);
                   ^~~~~~~~~~~~~~~~~~~~
/home/mart/git/kotlin/kt-ndarray-simd/xsimd/include/xsimd/config/../types/../types/xsimd_batch.hpp:630:24: note: in instantiation of function template specialization 'xsimd::kernel::load_unaligned' requested here
        return kernel::load_unaligned(mem, kernel::convert {}, A {});
                       ^
/home/mart/git/kotlin/kt-ndarray-simd/src/lib/arch/../cpp/arithmetic_priv.h:12:26: note: in instantiation of function template specialization 'xsimd::batch::load_unaligned' requested here
        auto va = batch::load_unaligned(&a[i]);
                         ^
/home/mart/git/kotlin/kt-ndarray-simd/xsimd/include/xsimd/config/../types/../types/xsimd_batch.hpp:113:11: note: candidate constructor (the implicit copy constructor) not viable: no known conversion from '__m512d' (vector of 8 'double' values) to 'const xsimd::batch &' for 1st argument
    class batch : public types::simd_register, public types::integral_only_operators
          ^
/home/mart/git/kotlin/kt-ndarray-simd/xsimd/include/xsimd/config/../types/../types/xsimd_batch.hpp:113:11: note: candidate constructor (the implicit move constructor) not viable: no known conversion from '__m512d' (vector of 8 'double' values) to 'xsimd::batch &&' for 1st argument
/home/mart/git/kotlin/kt-ndarray-simd/xsimd/include/xsimd/config/../types/../types/xsimd_batch.hpp:127:22: note: candidate constructor not viable: no known conversion from '__m512d' (vector of 8 'double' values) to 'double' for 1st argument
        XSIMD_INLINE batch(T val) noexcept;
                     ^
/home/mart/git/kotlin/kt-ndarray-simd/xsimd/include/xsimd/config/../types/../types/xsimd_batch.hpp:131:22: note: candidate constructor not viable: no known conversion from '__m512d' (vector of 8 'double' values) to 'xsimd::batch::register_type' (aka 'xsimd::types::simd_register::register_type') for 1st argument
        XSIMD_INLINE batch(register_type reg) noexcept;
                     ^
/home/mart/git/kotlin/kt-ndarray-simd/xsimd/include/xsimd/config/../types/../types/xsimd_batch.hpp:130:31: note: explicit constructor is not a candidate
        XSIMD_INLINE explicit batch(batch_bool_type const& b) noexcept;
                              ^
/home/mart/git/kotlin/kt-ndarray-simd/xsimd/include/xsimd/config/../types/../types/xsimd_batch.hpp:769:9: error: no matching function for call to 'static_check_supported_config'
        detail::static_check_supported_config();
        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/mart/git/kotlin/kt-ndarray-simd/xsimd/include/xsimd/config/../types/../types/xsimd_batch.hpp:218:32: note: in instantiation of member function 'xsimd::batch::operator+=' requested here
            return batch(self) += other;
                               ^
/home/mart/git/kotlin/kt-ndarray-simd/src/lib/arch/../cpp/arithmetic_priv.h:13:23: note: in instantiation of member function 'xsimd::operator+' requested here
        auto res = va + vb;
                      ^
/home/mart/git/kotlin/kt-ndarray-simd/xsimd/include/xsimd/config/../types/../types/xsimd_traits.hpp:89:27: note: candidate template ignored: substitution failure [with T = double, A = xsimd::avx512bw]
        XSIMD_INLINE void static_check_supported_config()
                          ^
In file included from /home/mart/git/kotlin/kt-ndarray-simd/src/lib/arch/avx512bw.cpp:1:
In file included from /home/mart/git/kotlin/kt-ndarray-simd/src/lib/arch/../cpp/arithmetic_priv.h:3:
In file included from /home/mart/git/kotlin/kt-ndarray-simd/src/lib/arch/../cpp/common.h:3:
In file included from /home/mart/git/kotlin/kt-ndarray-simd/xsimd/include/xsimd/xsimd.hpp:62:
In file included from /home/mart/git/kotlin/kt-ndarray-simd/xsimd/include/xsimd/config/../types/../types/xsimd_batch.hpp:492:
In file included from /home/mart/git/kotlin/kt-ndarray-simd/xsimd/include/xsimd/config/../types/../arch/xsimd_isa.hpp:72:
/home/mart/git/kotlin/kt-ndarray-simd/xsimd/include/xsimd/arch/./xsimd_avx512f.hpp:309:20: error: no matching function for call to '_mm512_add_pd'
            return _mm512_add_pd(self, other);
                   ^~~~~~~~~~~~~
/home/mart/git/kotlin/kt-ndarray-simd/xsimd/include/xsimd/config/../types/../types/xsimd_batch.hpp:770:32: note: in instantiation of function template specialization 'xsimd::kernel::add' requested here
        return *this = kernel::add(*this, other, A {});
                               ^
/home/mart/git/kotlin/kt-ndarray-simd/xsimd/include/xsimd/config/../types/../types/xsimd_batch.hpp:218:32: note: in instantiation of member function 'xsimd::batch::operator+=' requested here
            return batch(self) += other;
                               ^
/home/mart/git/kotlin/kt-ndarray-simd/src/lib/arch/../cpp/arithmetic_priv.h:13:23: note: in instantiation of member function 'xsimd::operator+' requested here
        auto res = va + vb;
                      ^
/home/mart/.konan/dependencies/llvm-11.1.0-linux-x64-essentials/lib/clang/11.1.0/include/avx512fintrin.h:816:1: note: candidate function not viable: no known conversion from 'const batch' to '__m512d' (vector of 8 'double' values) for 1st argument
_mm512_add_pd(__m512d __a, __m512d __b)
^
In file included from /home/mart/git/kotlin/kt-ndarray-simd/src/lib/arch/avx512bw.cpp:1:
In file included from /home/mart/git/kotlin/kt-ndarray-simd/src/lib/arch/../cpp/arithmetic_priv.h:3:
In file included from /home/mart/git/kotlin/kt-ndarray-simd/src/lib/arch/../cpp/common.h:3:
In file included from /home/mart/git/kotlin/kt-ndarray-simd/xsimd/include/xsimd/xsimd.hpp:67:
/home/mart/git/kotlin/kt-ndarray-simd/xsimd/include/xsimd/types/xsimd_api.hpp:2282:9: error: no matching function for call to 'static_check_supported_config'
        detail::static_check_supported_config();
        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/mart/git/kotlin/kt-ndarray-simd/xsimd/include/xsimd/types/xsimd_api.hpp:2362:9: note: in instantiation of function template specialization 'xsimd::store_as' requested here
        store_as(mem, val, unaligned_mode {});
        ^
/home/mart/git/kotlin/kt-ndarray-simd/src/lib/arch/../cpp/arithmetic_priv.h:14:16: note: in instantiation of function template specialization 'xsimd::store_unaligned' requested here
        xsimd::store_unaligned(&a[i], res);
               ^
/home/mart/git/kotlin/kt-ndarray-simd/xsimd/include/xsimd/config/../types/../types/xsimd_traits.hpp:89:27: note: candidate template ignored: substitution failure [with T = double, A = xsimd::avx512bw]
        XSIMD_INLINE void static_check_supported_config()
                          ^
In file included from /home/mart/git/kotlin/kt-ndarray-simd/src/lib/arch/avx512bw.cpp:1:
In file included from /home/mart/git/kotlin/kt-ndarray-simd/src/lib/arch/../cpp/arithmetic_priv.h:3:
In file included from /home/mart/git/kotlin/kt-ndarray-simd/src/lib/arch/../cpp/common.h:3:
In file included from /home/mart/git/kotlin/kt-ndarray-simd/xsimd/include/xsimd/xsimd.hpp:62:
In file included from /home/mart/git/kotlin/kt-ndarray-simd/xsimd/include/xsimd/config/../types/../types/xsimd_batch.hpp:492:
In file included from /home/mart/git/kotlin/kt-ndarray-simd/xsimd/include/xsimd/config/../types/../arch/xsimd_isa.hpp:72:
/home/mart/git/kotlin/kt-ndarray-simd/xsimd/include/xsimd/arch/./xsimd_avx512f.hpp:1837:20: error: no matching function for call to '_mm512_storeu_pd'
            return _mm512_storeu_pd(mem, self);
                   ^~~~~~~~~~~~~~~~
/home/mart/git/kotlin/kt-ndarray-simd/xsimd/include/xsimd/types/xsimd_api.hpp:2283:17: note: in instantiation of function template specialization 'xsimd::kernel::store_unaligned' requested here
        kernel::store_unaligned(dst, src, A {});
                ^
/home/mart/git/kotlin/kt-ndarray-simd/xsimd/include/xsimd/types/xsimd_api.hpp:2362:9: note: in instantiation of function template specialization 'xsimd::store_as' requested here
        store_as(mem, val, unaligned_mode {});
        ^
/home/mart/git/kotlin/kt-ndarray-simd/src/lib/arch/../cpp/arithmetic_priv.h:14:16: note: in instantiation of function template specialization 'xsimd::store_unaligned' requested here
        xsimd::store_unaligned(&a[i], res);
               ^
/home/mart/.konan/dependencies/llvm-11.1.0-linux-x64-essentials/lib/clang/11.1.0/include/avx512fintrin.h:4530:1: note: candidate function not viable: no known conversion from 'const batch' to '__m512d' (vector of 8 'double' values) for 2nd argument
_mm512_storeu_pd(void *__P, __m512d __A)
^
1 warning and 8 errors generated.
serge-sans-paille commented 1 month ago

I fail to reproduce: https://godbolt.org/z/cdxsaG34c Can you try to reproduce on godbolt? Thanks!

Martmists-GH commented 1 month ago

Including the last line from my post will cause it: https://godbolt.org/z/e3Yo7cYWP

serge-sans-paille commented 1 month ago

thanks. There are implicit dependencies between archs, see https://godbolt.org/z/K99PfP6fo for two working scenarios

Martmists-GH commented 1 month ago

Where can I see which arch depends on which other archs?

serge-sans-paille commented 1 month ago

https://github.com/xtensor-stack/xsimd/blob/6cbd5d89a7a5d77a76cf95370b3ed72e28338c23/include/xsimd/types/xsimd_avx512bw_register.hpp#L25

here you can see that avx512bw depends on avx512dq etc

this dependency is not arbitrary: to our knowledge, there's no hardware that has one without the other etc.

Martmists-GH commented 1 month ago

What about the avx512vnni extensions? I believe I've added all the compiler flags: -mavx512vnni -mavx512vbmi -mavx512ifma -mavx512bw -mavx512dq -mavx512cd -mavx512f https://godbolt.org/z/MP3G7cndc

Also, does imm8<neon64> (or any of the NEON instructions for that matter) need a specific flag? I'm getting similar errors.

serge-sans-paille commented 1 month ago

Should be fixed by #1043

Martmists-GH commented 4 weeks ago

Experiencing the same problem with NEON: https://godbolt.org/z/hdMY3PTs7