llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
29.37k stars 12.14k forks source link

NewGVN asserts "new class for instruction should not be dominated by instruction" #31030

Closed llvmbot closed 7 years ago

llvmbot commented 7 years ago
Bugzilla Link 31682
Resolution FIXED
Resolved on Jan 20, 2017 00:42
Version trunk
OS Windows NT
Blocks llvm/llvm-project#30343
Reporter LLVM Bugzilla Contributor

Extended Description

Reduced:

%struct.widget = type { i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, float, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i8, i8, i32, i32*, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, [9 x [16 x [16 x i16]]], [5 x [16 x [16 x i16]]], [9 x [8 x [8 x i16]]], [2 x [4 x [16 x [16 x i16]]]], [16 x [16 x i16]], [16 x [16 x i32]], i32***, i32, i32, i32, i32, i32, %struct.baz, %struct.snork, %struct.eggs, i32, i32*, i32, i32, i32, i32, [4 x [4 x i32]], i32, i32, i32, i32, i32, double, i32, i32, i32, i32, i16**, i16**, i16**, i16**, [15 x i16], i32, i32, i32, i32, i32, i32, i32, i32, [6 x [32 x i32]], i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, [1 x i32], i32, i32, [2 x i32], i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, %struct.wobble.1, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, double, double, i32, double, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, [3 x [2 x i32]], [2 x i32], i32, i32, i16, i32, i32, i32, i32, i32 } %struct.baz = type { i32, i32, [100 x %struct.snork], i32, float, float, float } %struct.snork = type { i32, i32, i32, i32, i32, i32, %struct.spam, %struct.wombat.0, %struct.wobble, i32, i32, i32, i32, i32, i32, i32, i32, i32 (i32), [3 x [2 x i32]] } %struct.spam = type { %struct.zot, %struct.wombat, %struct.wombat } %struct.zot = type { i32, i32, i8, i32, i32, i8, i8, i32, i32, i8, i32 } %struct.wombat = type { i32, i32, i32, i32, i32, i8, i32, i32, i32 } %struct.wombat.0 = type { [3 x [11 x %struct.quux]], [2 x [9 x %struct.quux]], [2 x [10 x %struct.quux]], [2 x [6 x %struct.quux]], [4 x %struct.quux], [4 x %struct.quux], [3 x %struct.quux] } %struct.quux = type { i16, i8, i64 } %struct.wobble = type { [2 x %struct.quux], [4 x %struct.quux], [3 x [4 x %struct.quux]], [10 x [4 x %struct.quux]], [10 x [15 x %struct.quux]], [10 x [15 x %struct.quux]], [10 x [5 x %struct.quux]], [10 x [5 x %struct.quux]], [10 x [15 x %struct.quux]], [10 x [15 x %struct.quux]] } %struct.eggs = type { i32, i32, i32, [2 x i32], i32, [8 x i32], %struct.eggs, %struct.eggs, i32, [2 x [4 x [4 x [2 x i32]]]], [16 x i8], [16 x i8], i32, i64, [4 x i32], [4 x i32], i64, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i16, double, i32, i32, i32, i32, i32, i32, i32, i32, i32 } %struct.wobble.1 = type { i32, i32, i32, i32, i32, %struct.wobble.1* }

@​global = external local_unnamed_addr global %struct.widget*, align 8

define void @​patatino() { bb: %tmp = load %struct.widget*, %struct.widget* @​global, align 8 %tmp1 = getelementptr inbounds %struct.widget, %struct.widget %tmp, i64 0, i32 5 br label %bb2

bb2: ; preds = %bb2, %bb %tmp3 = phi %struct.widget [ undef, %bb ], [ %tmp6, %bb2 ] %tmp4 = getelementptr inbounds %struct.widget, %struct.widget %tmp3, i64 0, i32 39 %tmp5 = load i32, i32 %tmp4, align 8 %tmp6 = load %struct.widget, %struct.widget** @​global, align 8 br i1 undef, label %bb2, label %bb7

bb7: ; preds = %bb2 %tmp8 = phi %struct.widget [ %tmp6, %bb2 ] %tmp9 = getelementptr inbounds %struct.widget, %struct.widget %tmp8, i64 0, i32 39 br label %bb10

bb10: ; preds = %bb10, %bb7 %tmp11 = load i32, i32* %tmp9, align 8 br label %bb10 }

llvmbot commented 7 years ago

Sure, i can convert it to a statistic.

I still want to assert on the phi node case, because i suspect if that happens we may hit a correctness issue somewhere.

Yes, a statistic + assert is great to me.

llvmbot commented 7 years ago

Sure, i can convert it to a statistic.

I still want to assert on the phi node case, because i suspect if that happens we may hit a correctness issue somewhere.

llvmbot commented 7 years ago

Can we keep the diagnostic in (just to get an idea of how many times it fires?)

llvmbot commented 7 years ago

Yeah, it's slightly worse in general, even though it helps with this testcase.

So, the assert should only hurt us optimization wise, i believe -because we aren't using the most dominating leader, we may not always find the most congruences in some weird cases. But this is true in general anyway, because what is affected by conditionals and what way is a crapshoot, and you may not always choose the most optimal set of congruences.

If we change the leader when we hit this case, we will cost at least another iteration. Not sure if it's worth it.

The only case i'm really concerned for correctness about is phi node naming, so i'm going to restrict the assert to phi nodes, because i want to understand cases we have with phi nodes in the same situation, to make sure they are okay.

llvmbot commented 7 years ago

The RPO for the instructions is:

RPO visit order: br label %bb10 RPO visit order: br label %bb10 RPO visit order: br i1 undef, label %bb2, label %bb7 RPO visit order: %tmp6 = load %struct.foo*, %struct.foo* @​global RPO visit order: %tmp8 = phi %struct.foo [ %tmp6, %bb2 ] RPO visit order: %tmp9 = getelementptr %struct.foo, %struct.foo %tmp8, i64 0, i32 1 RPO visit order: %tmp11 = load i32, i32 %tmp9 RPO visit order: %tmp3 = phi %struct.foo [ undef, %bb ], [ %tmp6, %bb2 ] RPO visit order: %tmp4 = getelementptr %struct.foo, %struct.foo %tmp3, i64 0, i32 1 RPO visit order: %tmp5 = load i32, i32 %tmp4 RPO visit order: br label %bb2 RPO visit order: %tmp = load %struct.foo, %struct.foo* @​global RPO visit order: %tmp1 = getelementptr %struct.foo, %struct.foo %tmp

Accounting for the different blocks, it would require less iterations to resolve this case.

It definitely is a win for this case. However, interestingly, overall, it's a lose, which doesn't make sense to me (i don't think it should ever take more iterations). I"m investigating further to make sure i didn't screw it up.

llvmbot commented 7 years ago

This actually is a valid case of a more dominating result being discovered to be part of a congruence class with a less dominating leader, given our current iteration order.

This would cause one extra iteration to resolve.

Here, interestingly, it is possible to order the instructions to avoid the extra iteration by ordering them in RPO of the def-use graph.

Here, that would cause us to resolve tmp6 before tmp3, and get the right answer for tmp3 the first time.

I'm not sure if it is worth it, i'll check.

llvmbot commented 7 years ago

Slightly smaller: source_filename = "pr31682.ll"

%struct.quux = type { i32, i32, [2 x [4 x [6 x [6 x i16]]]] }

@​global = external global %struct.quux*

define void @​pluto() { bb: %tmp = load %struct.quux*, %struct.quux* @​global %tmp1 = getelementptr %struct.quux, %struct.quux %tmp br label %bb2

bb2: ; preds = %bb2, %bb %tmp3 = phi %struct.quux [ undef, %bb ], [ %tmp6, %bb2 ] %tmp4 = getelementptr %struct.quux, %struct.quux %tmp3, i64 0, i32 1 %tmp5 = load i32, i32 %tmp4 %tmp6 = load %struct.quux, %struct.quux** @​global br i1 undef, label %bb2, label %bb7

bb7: ; preds = %bb2 %tmp8 = phi %struct.quux [ %tmp6, %bb2 ] %tmp9 = getelementptr %struct.quux, %struct.quux %tmp8, i64 0, i32 1 br label %bb10

bb10: ; preds = %bb10, %bb7 %tmp11 = load i32, i32* %tmp9 br label %bb10 }