Open regehr opened 1 year ago
@llvm/issue-subscribers-backend-aarch64
This might come down to FREEZE not existing in MachineIR. We end up with a copy from undef. That undef then gets propagated everywhere.
void SelectionDAGISel::Select_FREEZE(SDNode *N) {
// TODO: We don't have FREEZE pseudo-instruction in MachineInstr-level now.
// If FREEZE instruction is added later, the code below must be changed as
// well.
CurDAG->SelectNodeTo(N, TargetOpcode::COPY, N->getValueType(0),
N->getOperand(0));
}
cc @aqjune
it doesn't work for arbitrary freezes, but it's always safe to freeze a literal undef or poison to zero, would that be too pessimizing?
Other possible solutions would be: (1) add FREEZE in MachineIR, or (2) Make IMPLICIT_DEF (which is a MachineIR-version of undef) yield a consistent value even if it is read twice.
(I did not dump MachineIR intermediate code from this particular input function f
, but I expect it is highly likely to be related to the characteristics of IMPLICIT_DEF)
In the past I tried writing a patch that implements (2), but this was pretty hard because the undef-iness of IMPLICIT_DEF was so heavily used by backends.
here's a particularly small test triggering what I expect is the same issue
target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
target triple = "aarch64-linux-gnu"
define i32 @f() {
%1 = freeze i32 poison
%2 = urem i32 %1, 1348415390
ret i32 %2
}
I'm happy to work through the details of this one if it's helpful to folks, let me know
For what it's worth: In MIR we usually mark instructions inputs with the "undef" flag which takes them out of liveness processing and allows usage of any vreg (and even "invalid" vregs without any definition). That is cleaner because it doesn't break referential transparency which is a nice property of SSA that holds everywhere except for pesky undef or IMPLICIT_DEF...
As far as I know IMPLICIT_DEF was originally only introduced to be used to solve the problem of lowering PHI instructions with an "undef" operand (meaning a control flow join joins a dead with a live value) which cannot be otherwise modeled without information loss after removing the PHI instructions. So originally IMPLICIT_DEF (non-)liveranges only spanned from end of basic blocks to a join at the beginning of the next block. Unfortunately it seems the construct has taken on a live of its own lately and people use it for more things it wasn't intended to...
Anyway long-story short: In MIR use undef operands not IMPLCIT_DEF!
We don't want undef or IMPLICIT_DEF here. We want a register to be allocated, but not written and then used in multiple places without changing the value. We don't care what's in the register so long as its the same value for every use.
Maybe we need a pseudo instruction that writes a register, but gets deleted at ExpandPostRAPseudos?
Maybe we need a pseudo instruction that writes a register, but gets deleted at ExpandPostRAPseudos?
I think this would work, but I am wondering there is a lightweight solution in terms of code diff. I ran llc -print-after-all with the (first) example of this issue and got this log:
730 # End machine code for function f.
731
732 # *** IR Dump After Early Machine Loop Invariant Code Motion (early-machinelicm) ***:
733 # Machine code for function f: IsSSA, TracksLiveness
734 Function Live Ins: $w0 in %0
735
736 bb.0 (%ir-block.1):
737 liveins: $w0
738 %0:gpr32 = COPY $w0
739 %2:gpr32all = IMPLICIT_DEF
740 %1:gpr32 = COPY killed %2:gpr32all <-- This is the FREEZE(UNDEF)
741 %4:gpr64all = IMPLICIT_DEF
742 %3:gpr64all = INSERT_SUBREG %4:gpr64all(tied-def 0), %1:gpr32, %subreg.sub_32 <-- USED BY HERE
743 %5:gpr32 = COPY %3.sub_32:gpr64all
744 %6:gpr32 = LSRVWr %0:gpr32, killed %5:gpr32
745 %7:gpr32 = ORNWrr $wzr, %1:gpr32 <-- AND HERE
746 %9:gpr64all = IMPLICIT_DEF
747 %8:gpr64all = INSERT_SUBREG %9:gpr64all(tied-def 0), killed %7:gpr32, %subreg.sub_32
748 %10:gpr32 = COPY %8.sub_32:gpr64all
749 %11:gpr32 = MOVi32imm 2
750 %12:gpr32 = LSLVWr killed %11:gpr32, killed %10:gpr32
751 %13:gpr32 = ORRWrr killed %12:gpr32, killed %6:gpr32
752 $w0 = COPY %13:gpr32
753 RET_ReallyLR implicit $w0
754
755 # End machine code for function f.
=>
757 # *** IR Dump After Machine Common Subexpression Elimination (machine-cse) ***:
758 # Machine code for function f: IsSSA, TracksLiveness
759 Function Live Ins: $w0 in %0
760
761 bb.0 (%ir-block.1):
762 liveins: $w0
763 %0:gpr32 = COPY $w0
764 %2:gpr32 = IMPLICIT_DEF <-- THE COPY WAS REMOVED
765 %4:gpr64all = IMPLICIT_DEF
766 %3:gpr64all = INSERT_SUBREG %4:gpr64all(tied-def 0), %2:gpr32, %subreg.sub_32 <-- USES IMPLICIT_DEF HERE
767 %5:gpr32 = COPY %3.sub_32:gpr64all
768 %6:gpr32 = LSRVWr %0:gpr32, killed %5:gpr32
769 %7:gpr32 = ORNWrr $wzr, %2:gpr32
770 %9:gpr64all = IMPLICIT_DEF <-- HERE AS WELL
771 %8:gpr64all = INSERT_SUBREG %9:gpr64all(tied-def 0), killed %7:gpr32, %subreg.sub_32
772 %10:gpr32 = COPY %8.sub_32:gpr64all
773 %11:gpr32 = MOVi32imm 2
774 %12:gpr32 = LSLVWr killed %11:gpr32, killed %10:gpr32
775 %13:gpr32 = ORRWrr killed %12:gpr32, killed %6:gpr32
776 $w0 = COPY %13:gpr32
777 RET_ReallyLR implicit $w0
=>
...
=>
901 # *** IR Dump After Detect Dead Lanes (detect-dead-lanes) ***:
902 # Machine code for function f: IsSSA, TracksLiveness
903 Function Live Ins: $w0 in %0
904
905 bb.0 (%ir-block.1):
906 liveins: $w0
907 %0:gpr32 = COPY $w0
908 %2:gpr32 = IMPLICIT_DEF <-- STILL HERE (NO CHANGE, WILL BE CHANGED IN THE NEXT PASS)
909 %4:gpr64all = IMPLICIT_DEF
910 %3:gpr64all = INSERT_SUBREG %4:gpr64all(tied-def 0), %2:gpr32, %subreg.sub_32
911 %5:gpr32 = COPY %3.sub_32:gpr64all
912 %6:gpr32 = LSRVWr %0:gpr32, killed %5:gpr32
913 %7:gpr32 = ORNWrr $wzr, %2:gpr32
914 %9:gpr64all = IMPLICIT_DEF
915 %8:gpr64all = SUBREG_TO_REG 0, killed %7:gpr32, %subreg.sub_32
916 %10:gpr32 = COPY %8.sub_32:gpr64all
917 %11:gpr32 = MOVi32imm 2
918 %12:gpr32 = LSLVWr killed %11:gpr32, killed %10:gpr32
919 %13:gpr32 = ORRWrr killed %12:gpr32, killed %6:gpr32
920 $w0 = COPY %13:gpr32
921 RET_ReallyLR implicit $w0
922
923 # End machine code for function f.
=>
925 # *** IR Dump After Process Implicit Definitions (processimpdefs) ***:
926 # Machine code for function f: IsSSA, TracksLiveness
927 Function Live Ins: $w0 in %0
928
929 bb.0 (%ir-block.1):
930 liveins: $w0
931 %0:gpr32 = COPY $w0
<--- %3 WAS REMOVED
932 %6:gpr32 = LSRVWr %0:gpr32, killed undef %5:gpr32
933 %7:gpr32 = ORNWrr $wzr, undef %2:gpr32 <-- THE undef attribute IS HERE
934 %8:gpr64all = SUBREG_TO_REG 0, killed %7:gpr32, %subreg.sub_32
935 %10:gpr32 = COPY %8.sub_32:gpr64all
936 %11:gpr32 = MOVi32imm 2
937 %12:gpr32 = LSLVWr killed %11:gpr32, killed %10:gpr32
938 %13:gpr32 = ORRWrr killed %12:gpr32, killed %6:gpr32
939 $w0 = COPY %13:gpr32
940 RET_ReallyLR implicit $w0
It seems the last pass (processimpdefs) is responsible for this problem.
If the initial purpose of IMPLICIT_DEF was to fill in the undef operand of PHI and not do more optimizations, what about making processimpdefs
be more conservative in optimizing away IMPLICIT_DEF?
In the past I tried writing a patch that implements (2), but this was pretty hard because the undef-iness of IMPLICIT_DEF was so heavily used by backends.
After looking into the log I recalled that the Process Implicit Definitions pass was the major usage of this. I think that defining IMPLICIT_DEF as simply a nondeterministic value rather than undef will make people less surprise.
This function:
has a set of possible output for every input value, due to using the frozen poison. let's choose 1079001153 as the input parameter. in this case, the possible outputs do not include zero, and hopefully this will convince you:
https://alive2.llvm.org/ce/z/nJQcJ9
The AArch64 backend lowers our function to:
if we assume wlog that
w8 == 0
and pass the same 1079001153 to this code in w0, we get 0 being returned.Sorry for the not-fun explanation, it's the best I can do (and hopefully it's correct)
cc @nunoplopes