Open joyhou-hw opened 1 year ago
@llvm/issue-subscribers-backend-aarch64
This isn't anything to do with the QC flag as such, as there's nothing in the inline asm constraints saying it's written to (and I don't think there's any way to do that either), but rather it looks like the output constraint for vector_res_int64x2 is being ignored. Simpler example:
#include <arm_neon.h>
int64x2_t fn(int64x2_t x, int64x2_t y) {
unsigned long fpsr = 0;
int64x2_t ret;
asm volatile ("msr fpsr, %1" : "=X"(ret), "=r"(fpsr));
ret = vqsubq_s64(x, y);
return ret;
}
The "=X"(ret)
constraint means that the compiler must assume ret
is written to (gcc mentions this kind of artificial dependency in https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#Volatile-1), but at -O1 clang will move the sqsub above the msr.
this is the data dependency between volatile asm and neon intrinsics be ignored ?
it may add dependency for x 是OK? asm volatile ("msr fpsr, %1" : "=X"(ret), "=r"(fpsr)); ---> asm volatile ("msr fpsr, %1" : "=X"(x), "=r"(fpsr)); ret = vqsubq_s64(x, y);
sice ret was write write. x is parms for vqsubq_s64
Note the testcase in GCC is only compiled/run at -O0 explicitly because the inline-asm does not have a data dependency between them and the intrinsics.
As code is: https://godbolt.org/z/KM5z873Yz
This case is from the gnu deja test gcc.target/aarch64/advsimd-intrinsics/vqsub.c
I simplify the case for only test one small case. The case call the neon vqsubq_s64() intrinsics and test the fpsr.QC bit
Set FPSR, then sqsbu, then got FPSR.QC From the asm, it seems that "sqsub" is move up "msr fpsr" inst.