Open SamTebbs33 opened 1 year ago
@llvm/issue-subscribers-backend-aarch64
Hi, the two can be combined using the function below :-
#include <arm_neon.h>
uint16x8_t foo(uint32x4_t a, uint32x4_t b) {
uint32x4_t shifted_a = vshrq_n_u32(a, 20); //rightshifting vector a
uint32x4_t shifted_b = vshrq_n_u32(b, 20); //rightshitfing vector b
uint32x4x2_t combined = vzipq_u32(shifted_a, shifted_b); //combination of a and b to single 8-element strucute
uint16x8_t result = vreinterpretq_u16_u32(combined.val[0]); //reinterpreted back to 16-bit vector
return result; //returning the result.
}
The idea is for the compiler to emit the instructions in the comment above bar
when given the function foo
.
@SamTebbs33 I think it can be done like this :
#include <arm_neon.h>
uint16x8_t foo(uint32x4_t a, uint32x4_t b) {
uint16x8_t r = vcombine_u16(vshrn_n_u32(a, 16), vshrn_n_u32(b, 16));
return vshrq_n_u16(r, 4);
}
It will emit the following instruction :
Is it correct?
@SamTebbs33 Please assign this issue to me
@ayushi-8102 , can we work on this together also?
Actually , If my approach is correct then there is no need of working together as it is nearly resolved. @Unique-Usman
The idea is for the compiler to emit the same instructions in foo
as it does for bar
, not for us to rewrite the functions, so this is still unresolved.
Hi @SamTebbs33 can we discuss this problem. First lets break this question:-
Foo
function performs a right shift by 20 bits on each element of the input vectors and then combines them into a single vector of 16-bit integers.bar
converts the input vectors to 16-bit integers, combines them, and then performs a right shift by 4 bits on each element of the combined vector.The ask here is to generate the same output/instruction. what is meant by it?
https://godbolt.org/z/hfT7h7vjP
The two shifts and combine in
foo
can be compiled to the same asbar
. GCC also lacks this optimisation.