[AArch64] Get different assemble for code which doesn't use sve register

vfdff commented 5 days ago

test: https://gcc.godbolt.org/z/Mn6ob7efq
```
#define difference_type long long
```

void foo1(); extern long long _begin, _first;

int foo (long long a[]) { if (begin_ > first_) { difference_type d = _begin - first_; d = (d + 1) / 2; _begin -= __d; } return 0; }

* the above code split from **llvm-project/libcxx/include/__split_buffer**, we can see the final assemble doesn't include sve register, but it generate  different assemble for code with/without **+sve**. For the sve version, it even use a **sdiv** instruction for **__d = (__d + 1) / 2;**

> with sve

foo: // @foo adrp x8, :got:_begin adrp x10, :got:first_ ldr x8, [x8, :got_lo12:begin_] ldr x10, [x10, :got_lo12:_first] ldr x9, [x8] ldr x10, [x10] subs x10, x9, x10 b.le .LBB0_2 add x10, x10, #1 mov x11, #-2 // =0xfffffffffffffffe sdiv x10, x10, x11 add x9, x10, x9 str x9, [x8] .LBB0_2: mov w0, wzr ret

> without sve

foo: // @foo adrp x8, :got:_begin adrp x10, :got:first_ ldr x8, [x8, :got_lo12:begin_] ldr x10, [x10, :got_lo12:_first] ldr x9, [x8] ldr x10, [x10] subs x10, x9, x10 b.le .LBB0_2 add x11, x10, #1 add x12, x10, #2 cmp x11, #0 csinc x10, x12, x10, lt sub x9, x9, x10, asr #1 str x9, [x8] .LBB0_2: mov w0, wzr ret

vfdff commented 5 days ago

The difference is brought in by option -msve-vector-bits=256, https://gcc.godbolt.org/z/asqPvab95

related to AArch64TargetLowering::BuildSDIVPow2

llvmbot commented 5 days ago

@llvm/issue-subscribers-backend-aarch64

Author: Allen (vfdff)

* test: https://gcc.godbolt.org/z/Mn6ob7efq ``` #define difference_type long long void foo1(); extern long long __begin_, __first_; int foo (long long a[]) { if (__begin_ > __first_) { difference_type __d = __begin_ - __first_; __d = (__d + 1) / 2; __begin_ -= __d; } return 0; } ``` * the above code split from **llvm-project/libcxx/include/__split_buffer**, we can see the final assemble doesn't include sve register, but it generate different assemble for code with/without **+sve**. For the sve version, it even use a **sdiv** instruction for **__d = (__d + 1) / 2;** > with sve ``` foo: // @foo adrp x8, :got:__begin_ adrp x10, :got:__first_ ldr x8, [x8, :got_lo12:__begin_] ldr x10, [x10, :got_lo12:__first_] ldr x9, [x8] ldr x10, [x10] subs x10, x9, x10 b.le .LBB0_2 add x10, x10, #1 mov x11, #-2 // =0xfffffffffffffffe sdiv x10, x10, x11 add x9, x10, x9 str x9, [x8] .LBB0_2: mov w0, wzr ret ``` > without sve ``` foo: // @foo adrp x8, :got:__begin_ adrp x10, :got:__first_ ldr x8, [x8, :got_lo12:__begin_] ldr x10, [x10, :got_lo12:__first_] ldr x9, [x8] ldr x10, [x10] subs x10, x9, x10 b.le .LBB0_2 add x11, x10, #1 add x12, x10, #2 cmp x11, #0 csinc x10, x12, x10, lt sub x9, x9, x10, asr #1 str x9, [x8] .LBB0_2: mov w0, wzr ret ```

david-arm commented 5 days ago

Yes I agree. This doesn't look right:

  // For scalable and fixed types, mark them as cheap so we can handle it much
  // later. This allows us to handle larger than legal types.
  if (VT.isScalableVector() || Subtarget->useSVEForFixedLengthVectors())
    return SDValue(N, 0);

I think this code is assuming that VT must be always a scalable or fixed-width vector, but in fact we're dealing with a scalar type.

david-arm commented 5 days ago

If you want I'm happy to fix this?

vfdff commented 4 days ago

Thanks @david-arm , I already has a PR for it, please help me review that if you have time.

a little more smart test to reproduce this issue https://gcc.godbolt.org/z/1eqbsdd8P, and the float types don't have such issue, https://gcc.godbolt.org/z/EW3x8d35r

llvm / llvm-project

[AArch64] Get different assemble for code which doesn't use sve register #97821