Open 470e7f89-017b-4dca-8bb4-aa88c7fd5a01 opened 3 years ago
ConvertTwoCaseSwitch on switches with unreachable default loses value range information that could be mined from the switch. A switch with more than two cases is transformed by ForwardSwitchConditionToPHI which loses value range information too. It seems to be a common problem and the only way to preserve the information is to produce a clunky @llvm.assume.
IR Dump After SROA on _Z3fooi ; Function Attrs: mustprogress nounwind uwtable define dso_local i32 @_Z3fooi(i32 %0) #0 { %2 = icmp ult i32 %0, 1 br i1 %2, label %3, label %4 3: ; preds = %1 br label %5 4: ; preds = %1 br label %5 5: ; preds = %4, %3 %6 = phi i32 [ 0, %3 ], [ 1, %4 ] ret i32 %6 } IR Dump After SimplifyCFGPass on _Z3fooi ; Function Attrs: mustprogress nounwind uwtable define dso_local i32 @_Z3fooi(i32 %0) local_unnamed_addr #0 { %2 = icmp eq i32 %0, 0 %3 = select i1 %2, i32 0, i32 1 ret i32 %3 }
After the SimplifyCFGPass here, the information required regarding undefined paths to perform the fold on foo is lost and the transformation cannot be made correctly later.
There is already not enough information remains even before SimplifyCFGPass.
I've reported a bug on regarding a similar issue before. I think the underlying cause is different here though.
LLVM 15+ produces optimal code for both cases.
Extended Description
LLVM fails to optimize
int foo(int x) { switch (x) { case 0: return 0; case 1: return 1; } __builtin_unreachable(); }
into
int foo(int x) { return x; }
https://godbolt.org/z/M9YzaWexo
GCC misses this too.
Both LLVM and GCC fold well when another case is added, e.g. case 2: return 2;:
int bar(int x) { switch (x) { case 0: return 0; case 1: return 1; case 2: return 2; } __builtin_unreachable(); }
IR Dump After SROA on _Z3fooi ; Function Attrs: mustprogress nounwind uwtable define dso_local i32 @_Z3fooi(i32 %0) #0 { %2 = icmp ult i32 %0, 1 br i1 %2, label %3, label %4 3: ; preds = %1 br label %5 4: ; preds = %1 br label %5 5: ; preds = %4, %3 %6 = phi i32 [ 0, %3 ], [ 1, %4 ] ret i32 %6 } IR Dump After SimplifyCFGPass on _Z3fooi ; Function Attrs: mustprogress nounwind uwtable define dso_local i32 @_Z3fooi(i32 %0) local_unnamed_addr #0 { %2 = icmp eq i32 %0, 0 %3 = select i1 %2, i32 0, i32 1 ret i32 %3 }
After the SimplifyCFGPass here, the information required regarding undefined paths to perform the fold on foo is lost and the transformation cannot be made correctly later.
For bar, the optimization is actually done in the SimplifyCFGPass:
IR Dump After InstCombinePass on _Z3fooi ; Function Attrs: mustprogress nofree norecurse nosync nounwind readnone uwtable willreturn define dso_local i32 @_Z3fooi(i32 %0) local_unnamed_addr #0 { switch i32 %0, label %4 [ i32 0, label %5 i32 1, label %2 i32 2, label %3 ] 2: ; preds = %1 br label %5 3: ; preds = %1 br label %5 4: ; preds = %1 unreachable 5: ; preds = %1, %3, %2 %6 = phi i32 [ 2, %3 ], [ 1, %2 ], [ 0, %1 ] ret i32 %6 } IR Dump After SimplifyCFGPass on _Z3fooi ; Function Attrs: mustprogress nofree norecurse nosync nounwind readnone uwtable willreturn define dso_local i32 @_Z3fooi(i32 %0) local_unnamed_addr #0 { ret i32 %0 }