llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
27.51k stars 11.3k forks source link

Relax ARM NEON literal rules #43952

Open llvmbot opened 4 years ago

llvmbot commented 4 years ago
Bugzilla Link 44607
Version 9.0
OS Linux
Reporter LLVM Bugzilla Contributor
CC @DougGregor,@efriedma-quic,@zygoloid

Extended Description

Currently, the NEON "constant" restrictions are too strict compared to SSE2 and GCC.

include

static inline uint32x4_t shift(uint32x4_t inp, const int amt) { return vshlq_n_u32(inp, amt); }

int main() { uint32x4_t val = vdupq_n_u32(2384); uint32x4_t shifted = shift(val, 3); }

shift should be constant propagated, and Clang should accept this code.

GCC accepts this code, and Clang also accepts the SSE2 equivalent:

include

static inline m128i shift(m128i val, int amt) { return _mm_slli_epi32(val, amt); }

int main() { __m128i val = _mm_set1_epi32(2384); __m128i shifted = shift(val, 3); }

However, I get this with Clang 9.0.1 on Termux aarch64:

neon.cpp:7:12: error: argument to 'builtin_neon_vshlq_n_v' must be a constant integer return vshlq_n_u32(inp, amt); ^ ~~~ /data/data/com.termux/files/usr/lib/clang/9.0.1/include/arm_neon.h:24327:24: note: expanded from macro 'vshlq_n_u32' __ret = (uint32x4_t) builtin_neon_vshlq_n_v((int8x16_t)s0, p1, 50); \ ^ ~~~~ 1 error generated.

In addition, GCC also converts some things to the non-literal forms. If I remove the static inline part, I get the following assembly:

shift: dup v1.4s, w0 sshl v0.4s, v0.4s, v1.4s ret

This strict literal requirement makes things difficult for things like C++ wrappers, and the requirements should be relaxed like GCC and SSE2.

efriedma-quic commented 4 years ago

For shifts specifically, we can probably relax the rules, sure; we can lower a shift even if the shift amount isn't constant.

Note that both clang and gcc allow you to write "inp << amt".

llvmbot commented 4 years ago

In order to do something like this, ugly workarounds are needed:

https://github.com/xtensor-stack/xsimd/blob/8dbcab3e6a7ed08b402a026bda079482b825b2b1/include/xsimd/types/xsimd_neon_int32.hpp#L478

https://github.com/google/highwayhash/blob/0aaf66bb8a1634ceee4b778df51a652bdf4e1f17/highwayhash/vector_neon.h#L422

llvmbot commented 1 month ago

@llvm/issue-subscribers-backend-arm

Author: None (llvmbot)

| | | | --- | --- | | Bugzilla Link | [44607](https://llvm.org/bz44607) | | Version | 9.0 | | OS | Linux | | Reporter | LLVM Bugzilla Contributor | | CC | @DougGregor,@efriedma-quic,@zygoloid | ## Extended Description Currently, the NEON "constant" restrictions are too strict compared to SSE2 and GCC. #include <arm_neon.h> static inline uint32x4_t shift(uint32x4_t inp, const int amt) { return vshlq_n_u32(inp, amt); } int main() { uint32x4_t val = vdupq_n_u32(2384); uint32x4_t shifted = shift(val, 3); } `shift` should be constant propagated, and Clang should accept this code. GCC accepts this code, and Clang also accepts the SSE2 equivalent: #include <emmintrin.h> static inline __m128i shift(__m128i val, int amt) { return _mm_slli_epi32(val, amt); } int main() { __m128i val = _mm_set1_epi32(2384); __m128i shifted = shift(val, 3); } However, I get this with Clang 9.0.1 on Termux aarch64: neon.cpp:7:12: error: argument to '__builtin_neon_vshlq_n_v' must be a constant integer return vshlq_n_u32(inp, amt); ^ ~~~ /data/data/com.termux/files/usr/lib/clang/9.0.1/include/arm_neon.h:24327:24: note: expanded from macro 'vshlq_n_u32' __ret = (uint32x4_t) __builtin_neon_vshlq_n_v((int8x16_t)__s0, __p1, 50); \ ^ ~~~~ 1 error generated. In addition, GCC also converts some things to the non-literal forms. If I remove the static inline part, I get the following assembly: shift: dup v1.4s, w0 sshl v0.4s, v0.4s, v1.4s ret This strict literal requirement makes things difficult for things like C++ wrappers, and the requirements should be relaxed like GCC and SSE2.