llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
27.97k stars 11.54k forks source link

[AArch64] Avoid dependent FSQRT and FDIV where possible #60816

Open SamTebbs33 opened 1 year ago

SamTebbs33 commented 1 year ago

With the -freciprocal-math (and -funsafe-math-optimizations) flags the compiler can try harder to avoid dependent FSQRT and FDIV operations. For example

double res, res2, tmp;
void foo (double a, double b, int c, int d) {
  tmp = 1.0 / __builtin_sqrt (a);
  res = tmp * tmp;

  if (d)
    res2 = a * tmp;
}

With -Ofast AArch64 LLVM generates:

foo(double, double, int, int):                             // @foo(double, double, int, int)
        fsqrt   d1, d0
        fmov    d2, #1.00000000
        adrp    x8, tmp
        fdiv    d2, d2, d1
        str     d2, [x8, :lo12:tmp]
        fmul    d2, d2, d2
        adrp    x8, res
        str     d2, [x8, :lo12:res]
        cbz     w1, .LBB0_2
        fdiv    d0, d0, d1
        adrp    x8, res2
        str     d0, [x8, :lo12:res2]
.LBB0_2:
        ret

GCC at -Ofast can do:

foo(double, double, int, int):
        fmov    d1, 1.0e+0
        adrp    x0, .LANCHOR0
        fsqrt   d2, d0
        add     x2, x0, :lo12:.LANCHOR0
        fdiv    d0, d1, d0
        fmul    d1, d2, d0
        str     d0, [x2, 8]
        str     d1, [x0, #:lo12:.LANCHOR0]
        cbz     w1, .L1
        str     d2, [x2, 16]
.L1:
        ret

https://godbolt.org/z/717f54Teo

Notice how the expensive FSQRT and FDIV are now independent and can execute in parallel. A write-up of the transformation can be found in the GCC commit: http://gcc.gnu.org/g:24c49431499bcb462aeee41e027a3dac25e934b3

llvmbot commented 1 year ago

@llvm/issue-subscribers-backend-aarch64