llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
27.92k stars 11.52k forks source link

Missed fold for switch between two values #51127

Open 470e7f89-017b-4dca-8bb4-aa88c7fd5a01 opened 3 years ago

470e7f89-017b-4dca-8bb4-aa88c7fd5a01 commented 3 years ago
Bugzilla Link 51785
Version trunk
OS Windows NT
CC @Kojoley,@rotateright

Extended Description

LLVM fails to optimize

int foo(int x) { switch (x) { case 0: return 0; case 1: return 1; } __builtin_unreachable(); }

into

int foo(int x) { return x; }

https://godbolt.org/z/M9YzaWexo

GCC misses this too.

Both LLVM and GCC fold well when another case is added, e.g. case 2: return 2;:

int bar(int x) { switch (x) { case 0: return 0; case 1: return 1; case 2: return 2; } __builtin_unreachable(); }

IR Dump After SROA on _Z3fooi ; Function Attrs: mustprogress nounwind uwtable define dso_local i32 @​_Z3fooi(i32 %0) #​0 { %2 = icmp ult i32 %0, 1 br i1 %2, label %3, label %4 3: ; preds = %1 br label %5 4: ; preds = %1 br label %5 5: ; preds = %4, %3 %6 = phi i32 [ 0, %3 ], [ 1, %4 ] ret i32 %6 } IR Dump After SimplifyCFGPass on _Z3fooi ; Function Attrs: mustprogress nounwind uwtable define dso_local i32 @​_Z3fooi(i32 %0) local_unnamed_addr #​0 { %2 = icmp eq i32 %0, 0 %3 = select i1 %2, i32 0, i32 1 ret i32 %3 }

After the SimplifyCFGPass here, the information required regarding undefined paths to perform the fold on foo is lost and the transformation cannot be made correctly later.

For bar, the optimization is actually done in the SimplifyCFGPass:

IR Dump After InstCombinePass on _Z3fooi ; Function Attrs: mustprogress nofree norecurse nosync nounwind readnone uwtable willreturn define dso_local i32 @​_Z3fooi(i32 %0) local_unnamed_addr #​0 { switch i32 %0, label %4 [ i32 0, label %5 i32 1, label %2 i32 2, label %3 ] 2: ; preds = %1 br label %5 3: ; preds = %1 br label %5 4: ; preds = %1 unreachable 5: ; preds = %1, %3, %2 %6 = phi i32 [ 2, %3 ], [ 1, %2 ], [ 0, %1 ] ret i32 %6 } IR Dump After SimplifyCFGPass on _Z3fooi ; Function Attrs: mustprogress nofree norecurse nosync nounwind readnone uwtable willreturn define dso_local i32 @​_Z3fooi(i32 %0) local_unnamed_addr #​0 { ret i32 %0 }

Kojoley commented 3 years ago

ConvertTwoCaseSwitch on switches with unreachable default loses value range information that could be mined from the switch. A switch with more than two cases is transformed by ForwardSwitchConditionToPHI which loses value range information too. It seems to be a common problem and the only way to preserve the information is to produce a clunky @​llvm.assume.

IR Dump After SROA on _Z3fooi ; Function Attrs: mustprogress nounwind uwtable define dso_local i32 @​_Z3fooi(i32 %0) #​0 { %2 = icmp ult i32 %0, 1 br i1 %2, label %3, label %4 3: ; preds = %1 br label %5 4: ; preds = %1 br label %5 5: ; preds = %4, %3 %6 = phi i32 [ 0, %3 ], [ 1, %4 ] ret i32 %6 } IR Dump After SimplifyCFGPass on _Z3fooi ; Function Attrs: mustprogress nounwind uwtable define dso_local i32 @​_Z3fooi(i32 %0) local_unnamed_addr #​0 { %2 = icmp eq i32 %0, 0 %3 = select i1 %2, i32 0, i32 1 ret i32 %3 }

After the SimplifyCFGPass here, the information required regarding undefined paths to perform the fold on foo is lost and the transformation cannot be made correctly later.

There is already not enough information remains even before SimplifyCFGPass.

470e7f89-017b-4dca-8bb4-aa88c7fd5a01 commented 3 years ago

I've reported a bug on regarding a similar issue before. I think the underlying cause is different here though.

Kojoley commented 3 months ago

LLVM 15+ produces optimal code for both cases.