Open diggerlin opened 3 hours ago
@llvm/issue-subscribers-backend-powerpc
Author: zhijian lin (diggerlin)
The PowerPC VSX FMA Mutation
pass convert COPY
adjacent with XSMADDADP
instruction to a single instruction XSMADDMDP
as the comment in the https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/PowerPC/PPCVSXFMAMutate.cpp#L90
// %5 = COPY %9; VSLRC:%5,%9
// %5<def,tied1> = XSMADDADP %5<tied0>, %17, %16,
// implicit %rm; VSLRC:%5,%17,%16
// ...
// %9<def,tied1> = XSMADDADP %9<tied0>, %17, %19,
// implicit %rm; VSLRC:%9,%17,%19
// ...
// Where we can eliminate the copy by changing from the A-type to the
// M-type instruction. Specifically, for this example, this means:
// %5<def,tied1> = XSMADDADP %5<tied0>, %17, %16,
// implicit %rm; VSLRC:%5,%17,%16
// is replaced by:
// %16<def,tied1> = XSMADDMDP %16<tied0>, %18, %9,
// implicit %rm; VSLRC:%16,%18,%9
// and we remove: %5 = COPY %9; VSLRC:%5,%9
but the Register Coalescer pass, which eliminates COPY instructions, can prevent the PowerPC VSX FMA Mutation pass from converting a COPY adjacent to XSMADDADP into a single XSMADDMDP instruction in the this case.
The Register Coalescer
pass will convert following IR
bb.1.for.body.preheader:
; predecessors: %bb.0
successors: %bb.2(0x80000000); %bb.2(100.00%)
MTCTR8loop killed %0:g8rc, implicit-def dead $ctr8
%8:g8rc = LI8 0
%10:vsrc = XXSPLTIW 1069066811
%11:vsrc = XXSPLTIW 1170469888 (The line will move into loop body bb.2.for.body: )
%14:g8rc_and_g8rc_nox0 = COPY killed %8:g8rc
bb.2.for.body:
; predecessors: %bb.1, %bb.2
successors: %bb.2(0x7c000000), %bb.3(0x04000000); %bb.2(96.88%), %bb.3(3.12%)
%1:g8rc_and_g8rc_nox0 = COPY killed %14:g8rc_and_g8rc_nox0
%9:vsrc = LXVX %4:g8rc_and_g8rc_nox0, %1:g8rc_and_g8rc_nox0 :: (load (s128) from %ir.scevgep1, align 1)
%12:vsrc = COPY %11:vsrc (The line will be replace with %12:vsrc = XXSPLTIW 1170469888)
%12:vsrc = contract nofpexcept XVMADDASP %12:vsrc(tied-def 0), killed %9:vsrc, %10:vsrc, implicit $rm
STXVX killed %12:vsrc, %3:g8rc_and_g8rc_nox0, %1:g8rc_and_g8rc_nox0 :: (store (s128) into %ir.scevgep, align 1)
%2:g8rc = nuw nsw ADDI8 killed %1:g8rc_and_g8rc_nox0, 16
%14:g8rc_and_g8rc_nox0 = COPY killed %2:g8rc
BDNZ8 %bb.2, implicit-def $ctr8, implicit $ctr8
B %bb.3
to
bb.1.for.body.preheader:
; predecessors: %bb.0
successors: %bb.2(0x80000000); %bb.2(100.00%)
MTCTR8loop %0:g8rc, implicit-def dead $ctr8
%14:g8rc_and_g8rc_nox0 = LI8 0
%10:vsrc = XXSPLTIW 1069066811
bb.2.for.body:
; predecessors: %bb.1, %bb.2
successors: %bb.2(0x7c000000), %bb.3(0x04000000); %bb.2(96.88%), %bb.3(3.12%)
%9:vsrc = LXVX %4:g8rc_and_g8rc_nox0, %14:g8rc_and_g8rc_nox0 :: (load (s128) from %ir.scevgep1, align 1)
%12:vsrc = XXSPLTIW 1170469888 (The line is moved from bb.1.for.body.preheader into the loop and COPY eliminated )
%12:vsrc = contract nofpexcept XVMADDASP %12:vsrc(tied-def 0), %9:vsrc, %10:vsrc, implicit $rm
STXVX %12:vsrc, %3:g8rc_and_g8rc_nox0, %14:g8rc_and_g8rc_nox0 :: (store (s128) into %ir.scevgep, align 1)
%14:g8rc_and_g8rc_nox0 = nuw nsw ADDI8 %14:g8rc_and_g8rc_nox0, 16
BDNZ8 %bb.2, implicit-def $ctr8, implicit $ctr8
B %bb.3
It will prevent the PowerPC VSX FMA Mutation (ppc-vsx-fma-mutate) on vsexp pass
converting
%12:vsrc = COPY %11:vsrc
%12:vsrc = contract nofpexcept XVMADDASP %12:vsrc(tied-def 0), killed %9:vsrc, %10:vsrc, implicit $rm
to
%9:vsrc = contract nofpexcept XVMADDMSP %9:vsrc(tied-def 0), %10:vsrc, %11:vsrc, implicit $rm
Description:
bash> cat test.c
when compile with
-mllvm -disable-ppc-vsx-fma-mutation=false -mllvm -schedule-ppc-vsx-fma-mutation-early
it generate the asm as (the loop has 6 instructions)
obviously , there is more efficient code as which move
xxspltiw vs2, 1170469888
out from the loop and change thexvmaddasp
to `xvmaddmsp' , the asm code as following (the loop only has 5 instructions)