google / highway

Performance-portable, length-agnostic SIMD with runtime dispatch
Apache License 2.0
4.17k stars 319 forks source link

VFPv3: Dynamic dispatch vs Debian binaries #818

Closed malaterre closed 2 years ago

malaterre commented 2 years ago

I fail to understand how to compile highway on linux/armhf. Since highway claim to support dynamic dispatch, I assumed I could build highway on linux/armhf with HWY_CMAKE_ARM7:BOOL=ON and let the dynamic dispatch does it work at runtime.

If I understand the build system mechanism:

  1. HWY_CMAKE_ARM7:BOOL=ON imply that host system must support vfpv4 (I see some other neon extension eg a vmov.i32 in the binary) otherwise the test suite will fail,
  2. Currently hwy::SupportedTargets ony works on x86 arch (trivial to fix for linux) ()

Is the above correct ?

Followup to issue #495

ref:

(gdb) r "--gtest_filter=HwyMulTestGroup/HwyMulTest.TestAllMulAdd/Emu128" "--gtest_also_run_disabled_tests"
Starting program: /home/malat/highway-0.17.0/obj-arm-linux-gnueabihf/tests/mul_test "--gtest_filter=HwyMulTestGroup/HwyMulTest.TestAllMulAdd/Emu128" "--gtest_also_run_disabled_tests"
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/arm-linux-gnueabihf/libthread_db.so.1".
Running main() from ./googletest/src/gtest_main.cc
Note: Google Test filter = HwyMulTestGroup/HwyMulTest.TestAllMulAdd/Emu128
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from HwyMulTestGroup/HwyMulTest
[ RUN      ] HwyMulTestGroup/HwyMulTest.TestAllMulAdd/Emu128

Program received signal SIGILL, Illegal instruction.
0x004211d8 in hwy::ZeroBytes<16, float> (to=<optimized out>) at ./hwy/base.h:778
778     ./hwy/base.h: No such file or directory.
(gdb) x/i $pc
=> 0x4211d8 <_ZN3hwy8N_EMU12810TestMulAddclIfNS0_4SimdIfLj4ELi0EEEEEvT_T0_+32>: vmov.i32        q8, #0  ; 0x00000000
(gdb) bt
#0  0x004211d8 in hwy::ZeroBytes<16, float> (to=<optimized out>) at ./hwy/base.h:778
#1  hwy::N_EMU128::Zero<float, 4> () at ./hwy/ops/emu128-inl.h:115
#2  hwy::N_EMU128::TestMulAdd::operator()<float, hwy::N_EMU128::Simd<float, 4u, 0> > (d=..., this=<optimized out>) at ./hwy/tests/mul_test.cc:283
#3  0x004245bc in hwy::N_EMU128::detail::ForeachCappedR<float, 4u, 1u, hwy::N_EMU128::TestMulAdd>::Do (min_lanes=1, max_lanes=4) at ./hwy/tests/test_util-inl.h:168
#4  hwy::N_EMU128::detail::ForeachCappedR<float, 4u, 1u, hwy::N_EMU128::TestMulAdd>::Do (max_lanes=4, min_lanes=1) at ./hwy/tests/test_util-inl.h:168
#5  hwy::N_EMU128::ForExtendableVectors<hwy::N_EMU128::TestMulAdd, 0>::operator()<float> (this=<optimized out>) at ./hwy/tests/test_util-inl.h:263
#6  hwy::N_EMU128::ForPartialVectors<hwy::N_EMU128::TestMulAdd>::operator()<float> (t=0, this=0xbefff288) at ./hwy/tests/test_util-inl.h:512
#7  hwy::N_EMU128::ForFloatTypes<hwy::N_EMU128::ForPartialVectors<hwy::N_EMU128::TestMulAdd> > (func=...) at ./hwy/tests/test_util-inl.h:547
#8  0x0041d4b0 in hwy::N_EMU128::TestAllMulAdd () at ./hwy/tests/test_util-inl.h:495
#9  0x00448ed4 in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) ()
#10 0x00440988 in testing::Test::Run() ()
#11 0x00440ac6 in testing::TestInfo::Run() ()
#12 0x00440fde in testing::TestSuite::Run() ()
#13 0x004414b0 in testing::internal::UnitTestImpl::RunAllTests() ()
#14 0x00440bba in testing::UnitTest::Run() ()
#15 0x00405bfa in main ()
(gdb)

(*) https://salsa.debian.org/debian-phototools-team/highway/-/blob/master/debian/patches/neon.patch

jan-wassenberg commented 2 years ago

Hi @malaterre,

HWY_CMAKE_ARM7:BOOL=ON imply that host system must support vfpv4 Currently hwy::SupportedTargets ony works on x86* arch

Yes, that's correct. I would be happy to integrate AT_HWCAP, but we have a compiler issue which must be resolved first. Clang still does not support pragma target for SVE and NEON, instead requiring -march flags. However, I recently learned that GCC does in fact support this now.

I believe you only require GCC, so we could enable runtime dispatch for you there. This will require setting HWY_TARGET_STR and changing HWY_ATTAINABLE_TARGETS, plus checking AT_HWCAP (only on GCC+Linux).

We have a patch almost ready to go - can you help test?

malaterre commented 2 years ago

We have a patch almost ready to go - can you help test?

sure ! where is it ?

jan-wassenberg commented 2 years ago

Great, thanks :D It just made it through our CI (which does not mean much, because this is effectively #if 0 in that environment), see above.

malaterre commented 2 years ago

Second issue is on armhf:

/<<PKGBUILDDIR>>/hwy/ops/arm_sve-inl.h:19:10: fatal error: arm_sve.h: No such file or directory
   19 | #include <arm_sve.h>

Do you know the function prototype you need from it ?

Here is the file list for arm64:

Here is the file list for armhf:

malaterre commented 2 years ago

And the third one is on armel:

[2/90] /usr/bin/c++ -DHWY_SHARED_DEFINE -Dhwy_contrib_EXPORTS -I/<<PKGBUILDDIR>> -g -O2 -ffile-prefix-map=/<<PKGBUILDDIR>>=. -fstack-protector-strong -Wformat -Werror=format-security  -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -fvisibility=hidden -fvisibility-inlines-hidden -Wno-builtin-macro-redefined -D__DATE__=\"redacted\" -D__TIMESTAMP__=\"redacted\" -D__TIME__=\"redacted\" -fmerge-all-constants -Wall -Wextra -Wconversion -Wsign-conversion -Wvla -Wnon-virtual-dtor -fmath-errno -fno-exceptions -Werror -MD -MT CMakeFiles/hwy_contrib.dir/hwy/contrib/sort/vqsort_128a.cc.o -MF CMakeFiles/hwy_contrib.dir/hwy/contrib/sort/vqsort_128a.cc.o.d -o CMakeFiles/hwy_contrib.dir/hwy/contrib/sort/vqsort_128a.cc.o -c /<<PKGBUILDDIR>>/hwy/contrib/sort/vqsort_128a.cc
FAILED: CMakeFiles/hwy_contrib.dir/hwy/contrib/sort/vqsort_128a.cc.o 
/usr/bin/c++ -DHWY_SHARED_DEFINE -Dhwy_contrib_EXPORTS -I/<<PKGBUILDDIR>> -g -O2 -ffile-prefix-map=/<<PKGBUILDDIR>>=. -fstack-protector-strong -Wformat -Werror=format-security  -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -fvisibility=hidden -fvisibility-inlines-hidden -Wno-builtin-macro-redefined -D__DATE__=\"redacted\" -D__TIMESTAMP__=\"redacted\" -D__TIME__=\"redacted\" -fmerge-all-constants -Wall -Wextra -Wconversion -Wsign-conversion -Wvla -Wnon-virtual-dtor -fmath-errno -fno-exceptions -Werror -MD -MT CMakeFiles/hwy_contrib.dir/hwy/contrib/sort/vqsort_128a.cc.o -MF CMakeFiles/hwy_contrib.dir/hwy/contrib/sort/vqsort_128a.cc.o.d -o CMakeFiles/hwy_contrib.dir/hwy/contrib/sort/vqsort_128a.cc.o -c /<<PKGBUILDDIR>>/hwy/contrib/sort/vqsort_128a.cc
In file included from /<<PKGBUILDDIR>>/hwy/ops/arm_neon-inl.h:22,
                 from /<<PKGBUILDDIR>>/hwy/highway.h:307,
                 from /<<PKGBUILDDIR>>/hwy/contrib/sort/traits128-inl.h:26,
                 from /<<PKGBUILDDIR>>/hwy/contrib/sort/vqsort_128a.cc:24,
                 from /<<PKGBUILDDIR>>/hwy/foreach_target.h:81,
                 from /<<PKGBUILDDIR>>/hwy/contrib/sort/vqsort_128a.cc:21:
/usr/lib/gcc/arm-linux-gnueabi/11/include/arm_neon.h:31:2: error: #error "NEON intrinsics not available with the soft-float ABI.  Please use -mfloat-abi=softfp or -mfloat-abi=hard"
   31 | #error "NEON intrinsics not available with the soft-float ABI.  Please use -mfloat-abi=softfp or -mfloat-abi=hard"
      |  ^~~~~
stefson commented 2 years ago

build errors with armv7a-unknown-linux-gnueabihf with gcc-10.4.0, its latest git checkout plus the patch

/var/tmp/portage/portage/dev-cpp/highway-9999/work/highway-9999/hwy/targets.cc: In function ‘uint32_t hwy::{anonymous}::DetectTargets()’:
/var/tmp/portage/portage/dev-cpp/highway-9999/work/highway-9999/hwy/targets.cc:293:34: error: ‘HWCAP_ASIMD’ was not declared in this scope
  293 |   constexpr CapBits kGroupNEON = HWCAP_ASIMD | HWCAP_AES;
      |                                  ^~~~~~~~~~~
/var/tmp/portage/portage/dev-cpp/highway-9999/work/highway-9999/hwy/targets.cc:293:48: error: ‘HWCAP_AES’ was not declared in this scope; did you mean ‘HWCAP2_AES’?
  293 |   constexpr CapBits kGroupNEON = HWCAP_ASIMD | HWCAP_AES;
      |                                                ^~~~~~~~~
      |                                                HWCAP2_AES
/var/tmp/portage/portage/dev-cpp/highway-9999/work/highway-9999/hwy/targets.cc:294:33: error: ‘HWCAP_SVE’ was not declared in this scope; did you mean ‘HWCAP_SWP’?
  294 |   constexpr CapBits kGroupSVE = HWCAP_SVE;
      |                                 ^~~~~~~~~
      |                                 HWCAP_SWP
/var/tmp/portage/portage/dev-cpp/highway-9999/work/highway-9999/hwy/targets.cc:295:34: error: ‘HWCAP2_SVE2’ was not declared in this scope; did you mean ‘HWCAP2_SHA2’?
  295 |   constexpr CapBits kGroupSVE2 = HWCAP2_SVE2 | HWCAP2_SVEAES;
      |                                  ^~~~~~~~~~~
      |                                  HWCAP2_SHA2
/var/tmp/portage/portage/dev-cpp/highway-9999/work/highway-9999/hwy/targets.cc:295:48: error: ‘HWCAP2_SVEAES’ was not declared in this scope; did you mean ‘HWCAP2_AES’?
  295 |   constexpr CapBits kGroupSVE2 = HWCAP2_SVE2 | HWCAP2_SVEAES;
      |                                                ^~~~~~~~~~~~~
      |                                                HWCAP2_AES

full build log: build.log.zip

missing arm_sve.h header also reproduced

hope the results are helpfull :)

jan-wassenberg commented 2 years ago

Thank you both! It is difficult to develop without the target platform, I appreciate you testing.

For the HWY_ATTAINABLE issue, looks like the patch might not have been (fully) applied? This is indeed missing the HWY_ATTAINABLE macros added after 0.17.0: https://salsa.debian.org/debian-phototools-team/highway/-/blob/master/hwy/detect_targets.h

Ah, makes sense that arm_sve is missing on armv7/armhf. I've revised the code to avoid attemping to use it there.

I've also updated the HWCAP to only check for SVE+ flags on aarch64, and only check for the HWCAP if those macros are defined.

jan-wassenberg commented 2 years ago

I believe this is now done, at least for GCC. Please feel free to reopen if there are any related issues.