Closed eclipseo closed 1 month ago
There is a known LLVM codegen bug with the @llvm.s390.vperm LLVM IR intrinsic that Clang compiles the Z14/Z15 vec_perm intrinsic to, and that bug is described at https://github.com/llvm/llvm-project/issues/92615.
This LLVM codegen bug does not occur with the Altivec vec_perm intrinsic on PPC8/PPC9/PPC10, even on big-endian PPC8/PPC9/PPC10.
The __builtin_shufflevector which is available with Clang 3 or later or GCC 12 or later can be used to work around the Z14/Z15 bug in the case where the indices are constant.
Hi @johnplatts , can the compiler generate that vperm for the EMU128 target, or might there be another compiler bug lurking here?
Hi @johnplatts , can the compiler generate that vperm for the EMU128 target, or might there be another compiler bug lurking here?
I did some testing compiling the EMU128 target for Z14 with Clang 18.1.6, and all of the tests enabled by -DHWY_ENABLE_CONTRIB=OFF
except TestAllBitShuffle passed on EMU128 on Z14 (but the TestAllBitShuffle test failure is due to a bug in the BitShuffle implementation in generic_ops-inl.h, which is fixed in pull request #2232).
I am able to get all of the tests that are enabled with -DHWY_ENABLE_CONTRIB=OFF
to pass on EMU128 on Z14 with Clang 18.1.6 and the -march=z14 -mzvector
C++ options with the BitShuffle bug fix in pull request #2232.
Thank you @johnplatts for fixing this! @eclipseo , can you confirm?
Thank you @johnplatts for fixing this! @eclipseo , can you confirm?
Coming back from work, testing ASAP.
Thank you @johnplatts for fixing this! @eclipseo , can you confirm?
Coming back from work, testing ASAP.
I confirm this is fixed with https://github.com/google/highway/pull/2232 / 3ce50ffa85577140bdf088d8ee7830b76ac2501c
The LLVM trunk has a bug fix (that hasn't made its way into LLVM 18 or earlier) that should fix the bug that is there with the vec_perm intrinsic on Z14/Z15 with Clang 18 or earlier.
Hello Team,
The latest Highway, on Fedora Rawhide, s390x arch, Clang 18.1.6: