llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.65k stars 11.84k forks source link

[aarch64] Is data dependency between volatile asm and neon intrinsics vqsubq_s64() be ignored? #67654

Open joyhou-hw opened 1 year ago

joyhou-hw commented 1 year ago

As code is: https://godbolt.org/z/KM5z873Yz

This case is from the gnu deja test gcc.target/aarch64/advsimd-intrinsics/vqsub.c

I simplify the case for only test one small case. The case call the neon vqsubq_s64() intrinsics and test the fpsr.QC bit

Set FPSR, then sqsbu, then got FPSR.QC From the asm, it seems that "sqsub" is move up "msr fpsr" inst.

llvmbot commented 1 year ago

@llvm/issue-subscribers-backend-aarch64

As code is: https://godbolt.org/z/KM5z873Yz This case is from the gnu deja test gcc.target/aarch64/advsimd-intrinsics/vqsub.c I simplify the case for only test one small case. The case call the neon vqsubq_s64() intrinsics and test the fpsr.QC bit Set FPSR, then sqsbu, then got FPSR.QC From the asm, it seems that "sqsub" is move up "msr fpsr" inst.
john-brawn-arm commented 1 year ago

This isn't anything to do with the QC flag as such, as there's nothing in the inline asm constraints saying it's written to (and I don't think there's any way to do that either), but rather it looks like the output constraint for vector_res_int64x2 is being ignored. Simpler example:

#include <arm_neon.h>
int64x2_t fn(int64x2_t x, int64x2_t y) {
  unsigned long fpsr = 0;
  int64x2_t ret;
  asm volatile ("msr fpsr, %1" : "=X"(ret), "=r"(fpsr));
  ret = vqsubq_s64(x, y);
  return ret;
}

The "=X"(ret) constraint means that the compiler must assume ret is written to (gcc mentions this kind of artificial dependency in https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#Volatile-1), but at -O1 clang will move the sqsub above the msr.

joyhou-hw commented 1 year ago

this is the data dependency between volatile asm and neon intrinsics be ignored ?

joyhou-hw commented 1 year ago

it may add dependency for x 是OK? asm volatile ("msr fpsr, %1" : "=X"(ret), "=r"(fpsr)); ---> asm volatile ("msr fpsr, %1" : "=X"(x), "=r"(fpsr)); ret = vqsubq_s64(x, y);

sice ret was write write. x is parms for vqsubq_s64

pinskia commented 11 months ago

Note the testcase in GCC is only compiled/run at -O0 explicitly because the inline-asm does not have a data dependency between them and the intrinsics.