google / highway

Performance-portable, length-agnostic SIMD with runtime dispatch
Apache License 2.0
3.95k stars 305 forks source link

Test failure on s390x #2226

Closed eclipseo closed 1 month ago

eclipseo commented 1 month ago

Hello Team,

The latest Highway, on Fedora Rawhide, s390x arch, Clang 18.1.6:

 87/364 Test  #76: HwyBitPermuteTestGroup/HwyBitPermuteTest.TestAllBitShuffle/EMU128  # GetParam() = 2305843009213693952 .................................................Subprocess aborted***Exception:   0.08 sec
Running main() from /builddir/build/BUILD/googletest-1.14.0/googletest/src/gtest_main.cc
Note: Google Test filter = HwyBitPermuteTestGroup/HwyBitPermuteTest.TestAllBitShuffle/EMU128
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from HwyBitPermuteTestGroup/HwyBitPermuteTest
[ RUN      ] HwyBitPermuteTestGroup/HwyBitPermuteTest.TestAllBitShuffle/EMU128
i64x2 expect [0+ ->]:
  0x00000000000000d1,0x0000000000000002,
i64x2 actual [0+ ->]:
  0x00000000000000e9,0x00000000000000c0,
Abort at bit_permute_test.cc:68: EMU128, i64x2 lane 0 mismatch: expected '0x00000000000000d1', got '0x00000000000000e9'.

The following tests FAILED:
     76 - HwyBitPermuteTestGroup/HwyBitPermuteTest.TestAllBitShuffle/EMU128  # GetParam() = 2305843009213693952 (Subprocess aborted)
Errors while running CTest
johnplatts commented 1 month ago

There is a known LLVM codegen bug with the @llvm.s390.vperm LLVM IR intrinsic that Clang compiles the Z14/Z15 vec_perm intrinsic to, and that bug is described at https://github.com/llvm/llvm-project/issues/92615.

This LLVM codegen bug does not occur with the Altivec vec_perm intrinsic on PPC8/PPC9/PPC10, even on big-endian PPC8/PPC9/PPC10.

The __builtin_shufflevector which is available with Clang 3 or later or GCC 12 or later can be used to work around the Z14/Z15 bug in the case where the indices are constant.

jan-wassenberg commented 1 month ago

Hi @johnplatts , can the compiler generate that vperm for the EMU128 target, or might there be another compiler bug lurking here?

johnplatts commented 1 month ago

Hi @johnplatts , can the compiler generate that vperm for the EMU128 target, or might there be another compiler bug lurking here?

I did some testing compiling the EMU128 target for Z14 with Clang 18.1.6, and all of the tests enabled by -DHWY_ENABLE_CONTRIB=OFF except TestAllBitShuffle passed on EMU128 on Z14 (but the TestAllBitShuffle test failure is due to a bug in the BitShuffle implementation in generic_ops-inl.h, which is fixed in pull request #2232).

I am able to get all of the tests that are enabled with -DHWY_ENABLE_CONTRIB=OFF to pass on EMU128 on Z14 with Clang 18.1.6 and the -march=z14 -mzvector C++ options with the BitShuffle bug fix in pull request #2232.

jan-wassenberg commented 1 month ago

Thank you @johnplatts for fixing this! @eclipseo , can you confirm?

eclipseo commented 1 month ago

Thank you @johnplatts for fixing this! @eclipseo , can you confirm?

Coming back from work, testing ASAP.

eclipseo commented 1 month ago

Thank you @johnplatts for fixing this! @eclipseo , can you confirm?

Coming back from work, testing ASAP.

I confirm this is fixed with https://github.com/google/highway/pull/2232 / 3ce50ffa85577140bdf088d8ee7830b76ac2501c

johnplatts commented 2 weeks ago

The LLVM trunk has a bug fix (that hasn't made its way into LLVM 18 or earlier) that should fix the bug that is there with the vec_perm intrinsic on Z14/Z15 with Clang 18 or earlier.