google / highway

Performance-portable, length-agnostic SIMD with runtime dispatch
Apache License 2.0
3.96k stars 307 forks source link

hwy 1.1.0: Dynamic dispatch support for s390x #2210

Closed malaterre closed 1 month ago

malaterre commented 1 month ago

AFAIK the dynamic dispatch on s390x is not setup properly. Would it be possible to turn it on by default:


[100/160] : && /usr/bin/c++ -g -O2 -ffile-prefix-map=/<<PKGBUILDDIR>>=. -flto=auto -ffat-lto-objects -fstack-protector-strong -Wformat -Werror=format-security -DHWY_BROKEN_EMU128=0 -Wdate-time -D_FORTIFY_SOURCE=2 -flto=auto -ffat-lto-objects -Wl,-z,relro -Wl,-z,now     -fPIE -pie CMakeFiles/hwy_list_targets.dir/hwy/tests/list_targets.cc.o -o hwy_list_targets  -Wl,-rpath,/<<PKGBUILDDIR>>/obj-s390x-linux-gnu  libhwy.so.1.1.0 && cd /<<PKGBUILDDIR>>/obj-s390x-linux-gnu && /<<PKGBUILDDIR>>/obj-s390x-linux-gnu/hwy_list_targets || ( exit 0 )
Config: emu128:0 scalar:0 static:0 all_attain:0 is_test:0
Compiled HWY_TARGETS:   EMU128
HWY_ATTAINABLE_TARGETS: EMU128
HWY_BASELINE_TARGETS:   EMU128
HWY_STATIC_TARGET:      EMU128
HWY_BROKEN_TARGETS:    
HWY_DISABLED_TARGETS:  
Current CPU supports:   Z15 Z14 EMU128 SCALAR

ref:

jan-wassenberg commented 1 month ago

Thanks for reporting. The condition for enabling runtime dispatch is: #elif (.. HWY_ARCH_S390X ..) && \ (HWY_COMPILER_GCC_ACTUAL || HWY_COMPILER_CLANG >= 1700) && HWY_OS_LINUX && \ HWY_HAVE_AUXV Can you help figure out which of them is not set?

malaterre commented 1 month ago

I think you are starring at git/HEAD which include f8f8fdde4... I am using 1.1.0 for now. I'll see what I should import, otherwise I'll simply #define to 1 always

jan-wassenberg commented 1 month ago

FYI HEAD is very close to being released as 1.2, in case that helps?

malaterre commented 1 month ago

FYI HEAD is very close to being released as 1.2, in case that helps?

AFAIK, git/HEAD fails to build on multiple Debian archs: i386/arm64/armhf/riscv64/powerpc:

Not sure what you mean by "very close", but could you look into some of those build failures before tagging ?

As for s390x; this is PEBKAC, I forgot to use -march=z15 -mzvector. Closing as invalid.

jan-wassenberg commented 1 month ago

Thanks, yes, looking. Most are complaining about symbols added - isn't that legit?

malaterre commented 1 month ago

I'd appreciate comments on those three at least (will handle symbol issue on next upload)

arm64:

FAILED: CMakeFiles/math_test.dir/hwy/contrib/math/math_test.cc.o 
/usr/bin/c++ -DHWY_SHARED_DEFINE -I"/<<PKGBUILDDIR>>" -g -O2 -ffile-prefix-map=/<<PKGBUILDDIR>>=. -flto=auto -ffat-lto-objects -fstack-protector-strong -fstack-clash-protection -Wformat -Werror=format-security -mbranch-protection=standard -DHWY_BROKEN_EMU128=0 -mstrict-align -Wdate-time -D_FORTIFY_SOURCE=2 -O3 -DNDEBUG -std=c++17 -fPIE -fvisibility=hidden -fvisibility-inlines-hidden -Wno-builtin-macro-redefined -D__DATE__=\"redacted\" -D__TIMESTAMP__=\"redacted\" -D__TIME__=\"redacted\" -fmerge-all-constants -Wall -Wextra -Wconversion -Wsign-conversion -Wvla -Wnon-virtual-dtor -Wcast-align -fmath-errno -fno-exceptions -Wno-psabi -Werror -DHWY_IS_TEST=1 -DGTEST_HAS_PTHREAD=1 -MD -MT CMakeFiles/math_test.dir/hwy/contrib/math/math_test.cc.o -MF CMakeFiles/math_test.dir/hwy/contrib/math/math_test.cc.o.d -o CMakeFiles/math_test.dir/hwy/contrib/math/math_test.cc.o -c '/<<PKGBUILDDIR>>/hwy/contrib/math/math_test.cc'
In file included from /<<PKGBUILDDIR>>/hwy/tests/test_util-inl.h:28,
                 from /<<PKGBUILDDIR>>/hwy/contrib/math/math_test.cc:31,
                 from /<<PKGBUILDDIR>>/hwy/foreach_target.h:163,
                 from /<<PKGBUILDDIR>>/hwy/contrib/math/math_test.cc:28:
/<<PKGBUILDDIR>>/hwy/tests/test_util.h: In function ‘hwy::TypeName<double>(double, unsigned long)std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > [clone .constprop.3]’:
/<<PKGBUILDDIR>>/hwy/tests/test_util.h:152:19: error: this operation requires the SVE ISA extension
  152 |   detail::TypeName(detail::MakeTypeInfo<T>(), N, string100);
      |   ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/<<PKGBUILDDIR>>/hwy/tests/test_util.h:152:19: note: you can enable SVE using the command-line option ‘-march’, or by using the ‘target’ attribute or pragma

armhf:

[105/180] : && /usr/bin/c++ -g -O2 -ffile-prefix-map=/<<PKGBUILDDIR>>=. -flto=auto -ffat-lto-objects -fstack-protector-strong -fstack-clash-protection -Wformat -Werror=format-security -DHWY_BROKEN_EMU128=0 -Wno-psabi -mno-unaligned-access -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_TIME_BITS=64 -Wdate-time -D_FORTIFY_SOURCE=2 -O3 -DNDEBUG -flto=auto -ffat-lto-objects -Wl,-z,relro -Wl,-z,now     -fPIE -pie CMakeFiles/hwy_list_targets.dir/hwy/tests/list_targets.cc.o -o hwy_list_targets  -Wl,-rpath,"/<<PKGBUILDDIR>>/obj-arm-linux-gnueabihf"  libhwy.so.1.1.1 && cd "/<<PKGBUILDDIR>>/obj-arm-linux-gnueabihf" && "/<<PKGBUILDDIR>>/obj-arm-linux-gnueabihf/hwy_list_targets" || ( exit 0 )
Config: emu128:0 scalar:0 static:0 all_attain:0 is_test:0
Compiled HWY_TARGETS:   NEON
HWY_ATTAINABLE_TARGETS: NEON NEON_WITHOUT_AES EMU128
HWY_BASELINE_TARGETS:   NEON NEON_WITHOUT_AES EMU128
HWY_STATIC_TARGET:      NEON
HWY_BROKEN_TARGETS:    
HWY_DISABLED_TARGETS:  
WARNING: CPU supports 0x6000000020000000, software requires 0x2000000030000000

riscv64:

[7/180] /usr/bin/c++ -DHWY_SHARED_DEFINE -Dhwy_EXPORTS -I"/<<PKGBUILDDIR>>" -g -O2 -ffile-prefix-map=/<<PKGBUILDDIR>>=. -flto=auto -ffat-lto-objects -fstack-protector-strong -Wformat -Werror=format-security -DHWY_BROKEN_EMU128=0 -mstrict-align -Wdate-time -D_FORTIFY_SOURCE=2 -O3 -DNDEBUG -std=c++17 -fPIC -fvisibility=hidden -fvisibility-inlines-hidden -Wno-builtin-macro-redefined -D__DATE__=\"redacted\" -D__TIMESTAMP__=\"redacted\" -D__TIME__=\"redacted\" -fmerge-all-constants -Wall -Wextra -Wconversion -Wsign-conversion -Wvla -Wnon-virtual-dtor -Wcast-align -fmath-errno -fno-exceptions -Wno-psabi -Werror -MD -MT CMakeFiles/hwy.dir/hwy/per_target.cc.o -MF CMakeFiles/hwy.dir/hwy/per_target.cc.o.d -o CMakeFiles/hwy.dir/hwy/per_target.cc.o -c '/<<PKGBUILDDIR>>/hwy/per_target.cc'
FAILED: CMakeFiles/hwy.dir/hwy/per_target.cc.o 
/usr/bin/c++ -DHWY_SHARED_DEFINE -Dhwy_EXPORTS -I"/<<PKGBUILDDIR>>" -g -O2 -ffile-prefix-map=/<<PKGBUILDDIR>>=. -flto=auto -ffat-lto-objects -fstack-protector-strong -Wformat -Werror=format-security -DHWY_BROKEN_EMU128=0 -mstrict-align -Wdate-time -D_FORTIFY_SOURCE=2 -O3 -DNDEBUG -std=c++17 -fPIC -fvisibility=hidden -fvisibility-inlines-hidden -Wno-builtin-macro-redefined -D__DATE__=\"redacted\" -D__TIMESTAMP__=\"redacted\" -D__TIME__=\"redacted\" -fmerge-all-constants -Wall -Wextra -Wconversion -Wsign-conversion -Wvla -Wnon-virtual-dtor -Wcast-align -fmath-errno -fno-exceptions -Wno-psabi -Werror -MD -MT CMakeFiles/hwy.dir/hwy/per_target.cc.o -MF CMakeFiles/hwy.dir/hwy/per_target.cc.o.d -o CMakeFiles/hwy.dir/hwy/per_target.cc.o -c '/<<PKGBUILDDIR>>/hwy/per_target.cc'
In file included from /<<PKGBUILDDIR>>/hwy/ops/rvv-inl.h:19,
                 from /<<PKGBUILDDIR>>/hwy/highway.h:593,
                 from /<<PKGBUILDDIR>>/hwy/per_target.cc:28,
                 from /<<PKGBUILDDIR>>/hwy/foreach_target.h:314,
                 from /<<PKGBUILDDIR>>/hwy/per_target.cc:27:
/usr/lib/gcc/riscv64-linux-gnu/13/include/riscv_vector.h:32:2: error: #error "Vector intrinsics require the vector extension."
   32 | #error "Vector intrinsics require the vector extension."
      |  ^~~~~
/<<PKGBUILDDIR>>/hwy/ops/rvv-inl.h:417:36: error: ‘vuint8mf8_t’ was not declared in this scope; did you mean ‘uint128_t’?
  417 | #define HWY_RVV_V(BASE, SEW, LMUL) v##BASE##SEW##LMUL##_t
      |                                    ^
/<<PKGBUILDDIR>>/hwy/ops/rvv-inl.h:428:19: note: in expansion of macro ‘HWY_RVV_V’
  428 |   struct DFromV_t<HWY_RVV_V(BASE, SEW, LMUL)> {                                \
      |                   ^~~~~~~~~
/<<PKGBUILDDIR>>/hwy/ops/rvv-inl.h:162:3: note: in expansion of macro ‘HWY_SPECIALIZE’

thanks !

malaterre commented 1 month ago

powerpc (PPC32) is somewhat bizarre but I understand this is not first class, so it is acceptable to fail for now.

[4/180] /usr/bin/c++ -DHWY_SHARED_DEFINE -DTOOLCHAIN_MISS_ASM_HWCAP_H -Dhwy_EXPORTS -I"/<<PKGBUILDDIR>>" -g -O2 -ffile-prefix-map=/<<PKGBUILDDIR>>=. -flto=auto -ffat-lto-objects -fstack-protector-strong -Wformat -Werror=format-security -DHWY_BROKEN_EMU128=0 -maltivec -mcpu=power8 -mstrict-align -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_TIME_BITS=64 -Wdate-time -D_FORTIFY_SOURCE=2 -O3 -DNDEBUG -std=c++17 -fPIC -fvisibility=hidden -fvisibility-inlines-hidden -Wno-builtin-macro-redefined -D__DATE__=\"redacted\" -D__TIMESTAMP__=\"redacted\" -D__TIME__=\"redacted\" -fmerge-all-constants -Wall -Wextra -Wconversion -Wsign-conversion -Wvla -Wnon-virtual-dtor -Wcast-align -fmath-errno -fno-exceptions -Wno-psabi -Werror -MD -MT CMakeFiles/hwy.dir/hwy/targets.cc.o -MF CMakeFiles/hwy.dir/hwy/targets.cc.o.d -o CMakeFiles/hwy.dir/hwy/targets.cc.o -c '/<<PKGBUILDDIR>>/hwy/targets.cc'
FAILED: CMakeFiles/hwy.dir/hwy/targets.cc.o 
/usr/bin/c++ -DHWY_SHARED_DEFINE -DTOOLCHAIN_MISS_ASM_HWCAP_H -Dhwy_EXPORTS -I"/<<PKGBUILDDIR>>" -g -O2 -ffile-prefix-map=/<<PKGBUILDDIR>>=. -flto=auto -ffat-lto-objects -fstack-protector-strong -Wformat -Werror=format-security -DHWY_BROKEN_EMU128=0 -maltivec -mcpu=power8 -mstrict-align -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_TIME_BITS=64 -Wdate-time -D_FORTIFY_SOURCE=2 -O3 -DNDEBUG -std=c++17 -fPIC -fvisibility=hidden -fvisibility-inlines-hidden -Wno-builtin-macro-redefined -D__DATE__=\"redacted\" -D__TIMESTAMP__=\"redacted\" -D__TIME__=\"redacted\" -fmerge-all-constants -Wall -Wextra -Wconversion -Wsign-conversion -Wvla -Wnon-virtual-dtor -Wcast-align -fmath-errno -fno-exceptions -Wno-psabi -Werror -MD -MT CMakeFiles/hwy.dir/hwy/targets.cc.o -MF CMakeFiles/hwy.dir/hwy/targets.cc.o.d -o CMakeFiles/hwy.dir/hwy/targets.cc.o -c '/<<PKGBUILDDIR>>/hwy/targets.cc'
In file included from /<<PKGBUILDDIR>>/hwy/highway.h:583,
                 from /<<PKGBUILDDIR>>/hwy/targets.cc:23:
/<<PKGBUILDDIR>>/hwy/ops/ppc_vsx-inl.h: In function ‘V hwy::N_PPC8::BitShuffle(V, VI)’:
/<<PKGBUILDDIR>>/hwy/ops/ppc_vsx-inl.h:6979:37: error: expected ‘;’ before ‘__int128’
 6979 |   using RawVU128 = __vector unsigned __int128;
      |                                     ^~~~~~~~~
      |                                     ;

@johnplatts

jan-wassenberg commented 1 month ago

PPC is hopefully fixed by #2215. The docs sound like unsigned __int128 and unsigned char are equivalent, but I kept the original codepath for clarity.

jan-wassenberg commented 1 month ago

For RISC-V, seems some clang and gcc still require -march=rv64gcv1p0. Interestingly, riscv64-linux-gnu-g++-13 does not. I'm also fixing one missing macro in #2216.

jan-wassenberg commented 1 month ago

For armhf, the warning is that HWY_NEON is enabled, but at runtime we detect that it is not supported due to lack of AES. That is correct on armhf. The question is why we would select HWY_NEON as a target, because we check for AES via predefined macro:

#if defined(__ARM_NEON__) || defined(__ARM_NEON)
#undef HWY_BASELINE_NEON
#if defined(__ARM_FEATURE_AES) &&                    \
    defined(__ARM_FEATURE_FP16_VECTOR_ARITHMETIC) && \
    defined(__ARM_FEATURE_DOTPROD) &&                \
    defined(__ARM_FEATURE_BF16_VECTOR_ARITHMETIC)
#define HWY_BASELINE_NEON HWY_ALL_NEON
#elif defined(__ARM_FEATURE_AES)
#define HWY_BASELINE_NEON (HWY_NEON_WITHOUT_AES | HWY_NEON)
#else
#define HWY_BASELINE_NEON (HWY_NEON_WITHOUT_AES)
#endif  // __ARM_FEATURE*
#endif  // __ARM_NEON

Perhaps __ARM_FEATURE_AES is actually being set?

jan-wassenberg commented 1 month ago

Unable to repro the arm64 SVE issue on godbolt, even with the flags from the Debian builder. Any ideas?

malaterre commented 1 month ago

Unable to repro the arm64 SVE issue on godbolt, even with the flags from the Debian builder. Any ideas?

Here is the creduced version:

let me know if this is too aggressively reduced and you cannot recognize the original code.