llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
26.85k stars 11.01k forks source link

AArch64 backend: miscompile of testcase of a funnel shift of freeze poison #60429

Open regehr opened 1 year ago

regehr commented 1 year ago

This function:

target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
target triple = "aarch64-linux-gnu"

define i8 @f(i32 %0) {
  %2 = freeze i32 poison
  %3 = call i32 @llvm.fshr.i32(i32 1, i32 %0, i32 %2)
  %4 = trunc i32 %3 to i8
  ret i8 %4
}

; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(none)
declare i32 @llvm.fshr.i32(i32, i32, i32) #0

attributes #0 = { nocallback nofree nosync nounwind speculatable willreturn memory(none) }

has a set of possible output for every input value, due to using the frozen poison. let's choose 1079001153 as the input parameter. in this case, the possible outputs do not include zero, and hopefully this will convince you:

https://alive2.llvm.org/ce/z/nJQcJ9

The AArch64 backend lowers our function to:

f:
    mvn w8, w8
    mov w9, #2
    lsr w10, w0, w8
    lsl w8, w9, w8
    orr w0, w8, w10
    ret

if we assume wlog that w8 == 0 and pass the same 1079001153 to this code in w0, we get 0 being returned.

Sorry for the not-fun explanation, it's the best I can do (and hopefully it's correct)

cc @nunoplopes

llvmbot commented 1 year ago

@llvm/issue-subscribers-backend-aarch64

topperc commented 1 year ago

This might come down to FREEZE not existing in MachineIR. We end up with a copy from undef. That undef then gets propagated everywhere.

void SelectionDAGISel::Select_FREEZE(SDNode *N) {
  // TODO: We don't have FREEZE pseudo-instruction in MachineInstr-level now.
  // If FREEZE instruction is added later, the code below must be changed as
  // well.
  CurDAG->SelectNodeTo(N, TargetOpcode::COPY, N->getValueType(0),
                       N->getOperand(0));
}

cc @aqjune

regehr commented 1 year ago

it doesn't work for arbitrary freezes, but it's always safe to freeze a literal undef or poison to zero, would that be too pessimizing?

aqjune commented 1 year ago

Other possible solutions would be: (1) add FREEZE in MachineIR, or (2) Make IMPLICIT_DEF (which is a MachineIR-version of undef) yield a consistent value even if it is read twice. (I did not dump MachineIR intermediate code from this particular input function f, but I expect it is highly likely to be related to the characteristics of IMPLICIT_DEF)

In the past I tried writing a patch that implements (2), but this was pretty hard because the undef-iness of IMPLICIT_DEF was so heavily used by backends.

regehr commented 1 year ago

here's a particularly small test triggering what I expect is the same issue

target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
target triple = "aarch64-linux-gnu"

define i32 @f() {
  %1 = freeze i32 poison
  %2 = urem i32 %1, 1348415390
  ret i32 %2
}

I'm happy to work through the details of this one if it's helpful to folks, let me know

MatzeB commented 1 year ago

For what it's worth: In MIR we usually mark instructions inputs with the "undef" flag which takes them out of liveness processing and allows usage of any vreg (and even "invalid" vregs without any definition). That is cleaner because it doesn't break referential transparency which is a nice property of SSA that holds everywhere except for pesky undef or IMPLICIT_DEF...

As far as I know IMPLICIT_DEF was originally only introduced to be used to solve the problem of lowering PHI instructions with an "undef" operand (meaning a control flow join joins a dead with a live value) which cannot be otherwise modeled without information loss after removing the PHI instructions. So originally IMPLICIT_DEF (non-)liveranges only spanned from end of basic blocks to a join at the beginning of the next block. Unfortunately it seems the construct has taken on a live of its own lately and people use it for more things it wasn't intended to...

Anyway long-story short: In MIR use undef operands not IMPLCIT_DEF!

topperc commented 1 year ago

We don't want undef or IMPLICIT_DEF here. We want a register to be allocated, but not written and then used in multiple places without changing the value. We don't care what's in the register so long as its the same value for every use.

Maybe we need a pseudo instruction that writes a register, but gets deleted at ExpandPostRAPseudos?

aqjune commented 1 year ago

Maybe we need a pseudo instruction that writes a register, but gets deleted at ExpandPostRAPseudos?

I think this would work, but I am wondering there is a lightweight solution in terms of code diff. I ran llc -print-after-all with the (first) example of this issue and got this log:

 730 # End machine code for function f.
 731
 732 # *** IR Dump After Early Machine Loop Invariant Code Motion (early-machinelicm) ***:
 733 # Machine code for function f: IsSSA, TracksLiveness
 734 Function Live Ins: $w0 in %0
 735
 736 bb.0 (%ir-block.1):
 737   liveins: $w0
 738   %0:gpr32 = COPY $w0
 739   %2:gpr32all = IMPLICIT_DEF
 740   %1:gpr32 = COPY killed %2:gpr32all <-- This is the FREEZE(UNDEF)
 741   %4:gpr64all = IMPLICIT_DEF
 742   %3:gpr64all = INSERT_SUBREG %4:gpr64all(tied-def 0), %1:gpr32, %subreg.sub_32 <-- USED BY HERE
 743   %5:gpr32 = COPY %3.sub_32:gpr64all
 744   %6:gpr32 = LSRVWr %0:gpr32, killed %5:gpr32
 745   %7:gpr32 = ORNWrr $wzr, %1:gpr32             <-- AND HERE
 746   %9:gpr64all = IMPLICIT_DEF
 747   %8:gpr64all = INSERT_SUBREG %9:gpr64all(tied-def 0), killed %7:gpr32, %subreg.sub_32
 748   %10:gpr32 = COPY %8.sub_32:gpr64all
 749   %11:gpr32 = MOVi32imm 2
 750   %12:gpr32 = LSLVWr killed %11:gpr32, killed %10:gpr32
 751   %13:gpr32 = ORRWrr killed %12:gpr32, killed %6:gpr32
 752   $w0 = COPY %13:gpr32
 753   RET_ReallyLR implicit $w0
 754
 755 # End machine code for function f.

 =>

  757 # *** IR Dump After Machine Common Subexpression Elimination (machine-cse) ***:
 758 # Machine code for function f: IsSSA, TracksLiveness
 759 Function Live Ins: $w0 in %0
 760
 761 bb.0 (%ir-block.1):
 762   liveins: $w0
 763   %0:gpr32 = COPY $w0
 764   %2:gpr32 = IMPLICIT_DEF  <-- THE COPY WAS REMOVED
 765   %4:gpr64all = IMPLICIT_DEF
 766   %3:gpr64all = INSERT_SUBREG %4:gpr64all(tied-def 0), %2:gpr32, %subreg.sub_32 <-- USES IMPLICIT_DEF HERE
 767   %5:gpr32 = COPY %3.sub_32:gpr64all
 768   %6:gpr32 = LSRVWr %0:gpr32, killed %5:gpr32
 769   %7:gpr32 = ORNWrr $wzr, %2:gpr32
 770   %9:gpr64all = IMPLICIT_DEF       <-- HERE AS WELL
 771   %8:gpr64all = INSERT_SUBREG %9:gpr64all(tied-def 0), killed %7:gpr32, %subreg.sub_32
 772   %10:gpr32 = COPY %8.sub_32:gpr64all
 773   %11:gpr32 = MOVi32imm 2
 774   %12:gpr32 = LSLVWr killed %11:gpr32, killed %10:gpr32
 775   %13:gpr32 = ORRWrr killed %12:gpr32, killed %6:gpr32
 776   $w0 = COPY %13:gpr32
 777   RET_ReallyLR implicit $w0

 =>

 ...

 =>

  901 # *** IR Dump After Detect Dead Lanes (detect-dead-lanes) ***:
 902 # Machine code for function f: IsSSA, TracksLiveness
 903 Function Live Ins: $w0 in %0
 904
 905 bb.0 (%ir-block.1):
 906   liveins: $w0
 907   %0:gpr32 = COPY $w0
 908   %2:gpr32 = IMPLICIT_DEF    <-- STILL HERE (NO CHANGE, WILL BE CHANGED IN THE NEXT PASS)
 909   %4:gpr64all = IMPLICIT_DEF
 910   %3:gpr64all = INSERT_SUBREG %4:gpr64all(tied-def 0), %2:gpr32, %subreg.sub_32
 911   %5:gpr32 = COPY %3.sub_32:gpr64all
 912   %6:gpr32 = LSRVWr %0:gpr32, killed %5:gpr32
 913   %7:gpr32 = ORNWrr $wzr, %2:gpr32
 914   %9:gpr64all = IMPLICIT_DEF
 915   %8:gpr64all = SUBREG_TO_REG 0, killed %7:gpr32, %subreg.sub_32
 916   %10:gpr32 = COPY %8.sub_32:gpr64all
 917   %11:gpr32 = MOVi32imm 2
 918   %12:gpr32 = LSLVWr killed %11:gpr32, killed %10:gpr32
 919   %13:gpr32 = ORRWrr killed %12:gpr32, killed %6:gpr32
 920   $w0 = COPY %13:gpr32
 921   RET_ReallyLR implicit $w0
 922
 923 # End machine code for function f.

 =>

 925 # *** IR Dump After Process Implicit Definitions (processimpdefs) ***:
 926 # Machine code for function f: IsSSA, TracksLiveness
 927 Function Live Ins: $w0 in %0
 928
 929 bb.0 (%ir-block.1):
 930   liveins: $w0
 931   %0:gpr32 = COPY $w0
       <--- %3 WAS REMOVED
 932   %6:gpr32 = LSRVWr %0:gpr32, killed undef %5:gpr32
 933   %7:gpr32 = ORNWrr $wzr, undef %2:gpr32              <-- THE undef attribute IS HERE
 934   %8:gpr64all = SUBREG_TO_REG 0, killed %7:gpr32, %subreg.sub_32
 935   %10:gpr32 = COPY %8.sub_32:gpr64all
 936   %11:gpr32 = MOVi32imm 2
 937   %12:gpr32 = LSLVWr killed %11:gpr32, killed %10:gpr32
 938   %13:gpr32 = ORRWrr killed %12:gpr32, killed %6:gpr32
 939   $w0 = COPY %13:gpr32
 940   RET_ReallyLR implicit $w0

It seems the last pass (processimpdefs) is responsible for this problem. If the initial purpose of IMPLICIT_DEF was to fill in the undef operand of PHI and not do more optimizations, what about making processimpdefs be more conservative in optimizing away IMPLICIT_DEF?

aqjune commented 1 year ago

In the past I tried writing a patch that implements (2), but this was pretty hard because the undef-iness of IMPLICIT_DEF was so heavily used by backends.

After looking into the log I recalled that the Process Implicit Definitions pass was the major usage of this. I think that defining IMPLICIT_DEF as simply a nondeterministic value rather than undef will make people less surprise.