llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
26.77k stars 10.97k forks source link

Wrong canonicalization by InstCombine #96130

Open bonjune opened 2 weeks ago

bonjune commented 2 weeks ago

https://alive2.llvm.org/ce/z/Fv6zpv

https://github.com/llvm/llvm-project/blob/da0e5359fc1a5bf1749306440f9dad089046d772/llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp#L1656-L1665


----------------------------------------
define i10 @src(i10 %val0) {
entry:
  %val1 = urem i10 %val0, 42
  %val2 = sub i10 0, %val1
  %val3 = and i10 %val1, %val2
  %val4 = add i10 %val3, 1023
  ret i10 %val4
}
=>
define i10 @src(i10 %val0) nofree willreturn memory(none) {
entry:
  %val1 = urem i10 %val0, 42
  %#0 = add nsw i10 %val1, 1023
  %#1 = xor i10 %val1, 1023
  %val4 = and i10 %#0, %#1
  %#range_%val4 = !range i10 %val4, i10 1023, i10 41
  ret i10 %#range_%val4
}
Transformation doesn't verify!

ERROR: Target is more poisonous than source

Example:
i10 %val0 = undef

Source:
i10 %val1 = #x000 (0)   [based on undef value]
i10 %val2 = #x000 (0)
i10 %val3 = #x000 (0)
i10 %val4 = #x3ff (1023, -1)

Target:
i10 %val1 = #x000 (0)
i10 %#0 = #x3ff (1023, -1)
i10 %#1 = #x3fd (1021, -3)
i10 %val4 = #x3fd (1021, -3)
i10 %#range_%val4 = poison
Source value: #x3ff (1023, -1)
Target value: poison

Summary:
  0 correct transformations
  1 incorrect transformations
  0 failed-to-prove transformations
  0 Alive2 errors

The transformation does not properly handle undef as a result of urem i10 %val0, 42. undef propagates to intermediate instructions.

dtcxzyw commented 2 weeks ago

Target: i10 %val1 = #x000 (0) i10 %#0 = #x3ff (1023, -1) i10 %#1 = #x3fd (1021, -3)

IIRC %#1 should be 1023. It looks like a bug in alive2. cc @nunoplopes @regehr @zhengyang92

bonjune commented 2 weeks ago

Target: i10 %val1 = #x000 (0) i10 %#0 = #x3ff (1023, -1) i10 %#1 = #x3fd (1021, -3)

IIRC %#1 should be 1023. It looks like a bug in alive2. cc @nunoplopes @regehr @zhengyang92

Discussed in https://github.com/AliveToolkit/alive2/issues/1056. As far as I can tell, since it xor two undef values and thus can produce -3 as a result. Am I correct?

dtcxzyw commented 2 weeks ago

But the core transformation here is correct: https://alive2.llvm.org/ce/z/PwKRUa

define i10 @src(i10 %val1) {
entry:
  %val2 = sub i10 0, %val1
  %val3 = and i10 %val1, %val2
  %val4 = add i10 %val3, -1
  ret i10 %val4
}

define i10 @tgt(i10 %val1) {
entry:
  %val2 = add i10 %val1, -1
  %val3 = xor i10 %val1, -1
  %val4 = and i10 %val2, %val3
  ret i10 %val4
}

----------------------------------------
define i10 @src(i10 %val1) {
entry:
  %val2 = sub i10 0, %val1
  %val3 = and i10 %val1, %val2
  %val4 = add i10 %val3, 1023
  ret i10 %val4
}
=>
define i10 @tgt(i10 %val1) {
entry:
  %val2 = add i10 %val1, 1023
  %val3 = xor i10 %val1, 1023
  %val4 = and i10 %val2, %val3
  ret i10 %val4
}
Transformation seems to be correct!

If it is an LLVM bug, it is easy to fix this by adding an extra isGuaranteedNotToBeUndef check. But I think many places in InstCombine don't handle undef correctly since Alive2 doesn't complain about this without the urem instruction.

dtcxzyw commented 2 weeks ago

cc @nikic @goldsteinn

nikic commented 2 weeks ago

It seems like there has to be some kind of alive2 bug here, it shouldn't report a miscompilation only with the urem present. @nunoplopes Could it be that alive2 only considers full undef values are function inputs, but not partial undef?

nunoplopes commented 2 weeks ago

It seems like there has to be some kind of alive2 bug here, it shouldn't report a miscompilation only with the urem present. @nunoplopes Could it be that alive2 only considers full undef values are function inputs, but not partial undef?

That's correct. Alive2 only considers poison, fully undef, and regular inputs. We didn't enable partial undefs by default (it's implemented, but hidden)

nunoplopes commented 2 weeks ago

With the partial undef mode turned on, Alive2 says:

----------------------------------------
define i10 @src(i10 %val1) {
entry:
  %val2 = sub i10 0, %val1
  %val3 = and i10 %val1, %val2
  %val4 = add i10 %val3, 1023
  ret i10 %val4
}
=>
define i10 @tgt(i10 %val1) {
entry:
  %val2 = add i10 %val1, 1023
  %val3 = xor i10 %val1, 1023
  %val4 = and i10 %val2, %val3
  ret i10 %val4
}
Transformation doesn't verify!

ERROR: Value mismatch

Example:
i10 %val1 = #x000 (0)   [based on undef value]

Source:
i10 %val2 = #x000 (0)
i10 %val3 = #x000 (0)
i10 %val4 = #x3ff (1023, -1)

Target:
i10 %val2 = #x3ff (1023, -1)
i10 %val3 = #x3fd (1021, -3)
i10 %val4 = #x3fd (1021, -3)
Source value: #x3ff (1023, -1)
Target value: #x3fd (1021, -3)
nunoplopes commented 2 weeks ago

FWIW, this is the trick to enable it:

diff --git a/ir/value.cpp b/ir/value.cpp
index a8f1fa94..c045aa91 100644
--- a/ir/value.cpp
+++ b/ir/value.cpp
@@ -282,7 +282,7 @@ void Input::merge(const ParamAttrs &other) {

 expr Input::getUndefVar(const Type &ty, unsigned child) const {
   string tyname = "isundef_" + getSMTName(child);
-  //return expr::mkVar(tyname.c_str(), ty.getDummyValue(false).value);
+  return expr::mkVar(tyname.c_str(), ty.getDummyValue(false).value);
   // FIXME: only whole value undef or non-undef for now
   return expr::mkVar(tyname.c_str(), expr::mkUInt(0, 1));
 }
diff --git a/tools/transform.cpp b/tools/transform.cpp
index aaa1bb9e..dac6c38a 100644
--- a/tools/transform.cpp
+++ b/tools/transform.cpp
@@ -330,7 +330,7 @@ static void instantiate_undef(const Input *in, map<expr, expr> &instances,
     return;

   auto var = in->getUndefVar(ty, child);
-  if (!var.isValid())
+  if (1||!var.isValid())
     return;

   // TODO: add support for per-bit input undef
nikic commented 2 weeks ago

Thanks! Would it be viable to do a run of alive2 in partial undef mode over the test suite, or is it too slow? I'd like to have an idea of how common this issue is -- is this just an isolated case where partial undef is a problem but full undef isn't, or is this a widespread problem?

nunoplopes commented 2 weeks ago

Thanks! Would it be viable to do a run of alive2 in partial undef mode over the test suite, or is it too slow? I'd like to have an idea of how common this issue is -- is this just an isolated case where partial undef is a problem but full undef isn't, or is this a widespread problem?

Yep, I can do that once our server vacates. It may take a day, but that's fine.

nunoplopes commented 2 weeks ago

Took only 6.5 hours (instead of ~5h). The result is not pretty.. New failures:

+  LLVM :: Transforms/AggressiveInstCombine/lower-table-based-cttz-basics.ll
+  LLVM :: Transforms/AggressiveInstCombine/lower-table-based-cttz-zero-element.ll
+  LLVM :: Transforms/CodeGenPrepare/NVPTX/bypass-slow-div-constant-numerator.ll
+  LLVM :: Transforms/CorrelatedValuePropagation/phi-common-val.ll
+  LLVM :: Transforms/IndVarSimplify/AArch64/widen-loop-comp.ll
+  LLVM :: Transforms/IndVarSimplify/ARM/code-size.ll
+  LLVM :: Transforms/IndVarSimplify/exit_value_test3.ll
+  LLVM :: Transforms/IndVarSimplify/post-inc-range.ll
+  LLVM :: Transforms/IndVarSimplify/replace-loop-exit-folds.ll
+  LLVM :: Transforms/Inline/devirtualize.ll
+  LLVM :: Transforms/InstCombine/add-mask-neg.ll
+  LLVM :: Transforms/InstCombine/add.ll
+  LLVM :: Transforms/InstCombine/and-or-icmps.ll
+  LLVM :: Transforms/InstCombine/ashr-lshr.ll
+  LLVM :: Transforms/InstCombine/canonicalize-clamp-like-pattern-between-negative-and-positive-thresholds.ll
+  LLVM :: Transforms/InstCombine/canonicalize-clamp-like-pattern-between-zero-and-positive-threshold.ll
+  LLVM :: Transforms/InstCombine/ffs-i16.ll
+  LLVM :: Transforms/InstCombine/gep-inbounds-null.ll
+  LLVM :: Transforms/InstCombine/getelementptr.ll
+  LLVM :: Transforms/InstCombine/icmp-and-lowbit-mask.ll
+  LLVM :: Transforms/InstCombine/icmp-gep.ll
+  LLVM :: Transforms/InstCombine/minmax-of-xor-x.ll
+  LLVM :: Transforms/InstCombine/onehot_merge.ll
+  LLVM :: Transforms/InstCombine/partally-redundant-left-shift-input-masking-variant-e.ll
+  LLVM :: Transforms/InstCombine/redundant-left-shift-input-masking-after-truncation-variant-e.ll
+  LLVM :: Transforms/InstCombine/redundant-left-shift-input-masking-after-truncation-variant-f.ll
+  LLVM :: Transforms/InstCombine/redundant-left-shift-input-masking-variant-e.ll
+  LLVM :: Transforms/InstCombine/redundant-left-shift-input-masking-variant-f.ll
+  LLVM :: Transforms/InstCombine/rem.ll
+  LLVM :: Transforms/InstCombine/shift-by-signext.ll
+  LLVM :: Transforms/InstCombine/shift.ll
+  LLVM :: Transforms/InstCombine/sqrt.ll
+  LLVM :: Transforms/InstCombine/sub.ll
+  LLVM :: Transforms/InstCombine/switch-select.ll
+  LLVM :: Transforms/InstCombine/truncating-saturate.ll
+  LLVM :: Transforms/InstCombine/vector_insertelt_shuffle-inseltpoison.ll
+  LLVM :: Transforms/InstCombine/vector_insertelt_shuffle.ll
+  LLVM :: Transforms/InstSimplify/and-or-implied-cond.ll
+  LLVM :: Transforms/LoopFlatten/loop-flatten-version.ll
+  LLVM :: Transforms/LoopVectorize/first-order-recurrence.ll
+  LLVM :: Transforms/LoopVectorize/reduction-small-size.ll
+  LLVM :: Transforms/PhaseOrdering/X86/merge-functions.ll
+  LLVM :: Transforms/PhaseOrdering/two-shifts-by-sext.ll
+  LLVM :: Transforms/Scalarizer/variable-extractelement.ll
+  LLVM :: Transforms/Scalarizer/variable-insertelement.ll
+  LLVM :: Transforms/SimplifyCFG/ARM/switch-to-lookup-table.ll
+  LLVM :: Transforms/SimplifyCFG/nomerge.ll
+  LLVM :: Transforms/SimplifyCFG/preserve-branchweights-switch-create.ll
+  LLVM :: Transforms/SimplifyCFG/switch_create-custom-dl.ll
+  LLVM :: Transforms/SimplifyCFG/switch_create.ll
+  LLVM :: Transforms/StraightLineStrengthReduce/slsr-gep.ll
+  LLVM :: Transforms/VectorCombine/AArch64/load-extractelement-scalarization.ll
+  LLVM :: Transforms/VectorCombine/load-insert-store.ll

A few random examples:

; Transforms/InstCombine/shift.ll
define i32 @shl1_cttz(i32 %x) {
  %tz = cttz i32 %x, 1
  %shl = shl i32 1, %tz
  ret i32 %shl
}
=>
define i32 @shl1_cttz(i32 %x) {
  %neg = sub i32 0, %x
  %shl = and i32 %neg, %x
  ret i32 %shl
}
Transformation doesn't verify! (unsound)
ERROR: Target's return value is more undefined

Example:
i32 %x = #x00000006 (6) [based on undef value]

Source:
i32 %tz = #x00000001 (1)
i32 %shl = #x00000002 (2)

Target:
i32 %neg = #xfffffffa (4294967290, -6)
i32 %shl = #x0000000a (10)
Source value: #x00000002 (2)
Target value: #x0000000a (10)
; Transforms/InstCombine/sub.ll
define i8 @diff_of_squares(i8 %x, i8 %y) {
  %x2 = mul i8 %x, %x
  %y2 = mul i8 %y, %y
  %r = sub i8 %x2, %y2
  ret i8 %r
}
=>
define i8 @diff_of_squares(i8 %x, i8 %y) {
  %add = add i8 %x, %y
  %sub = sub i8 %x, %y
  %r = mul i8 %add, %sub
  ret i8 %r
}
Transformation doesn't verify! (unsound)
ERROR: Target's return value is more undefined

Example:
i8 %x = #x09 (9)
i8 %y = #x00 (0)        [based on undef value]

Source:
i8 %x2 = #x51 (81)
i8 %y2 = #x00 (0)
i8 %r = #x51 (81)

Target:
i8 %add = #x19 (25)
i8 %sub = #x09 (9)
i8 %r = #xe1 (225, -31)
Source value: #x51 (81)
Target value: #xe1 (225, -31)
; Transforms/InstCombine/gep-inbounds-null.ll
define <2 x i1> @test_vector_base(<2 x ptr> %base, i64 %idx) {
  %gep = gep inbounds <2 x ptr> %base, 1 x i64 %idx
  %cnd = icmp eq <2 x ptr> %gep, { null, null }
  ret <2 x i1> %cnd
}
=>
define <2 x i1> @test_vector_base(<2 x ptr> %base, i64 %idx) {
  %cnd = icmp eq <2 x ptr> %base, { null, null }
  ret <2 x i1> %cnd
}
Transformation doesn't verify! (unsound)
ERROR: Target's return value is more undefined

Example:
<2 x ptr> %base = < null, pointer(non-local, block_id=1, offset=0)      [based on undef value] >
i64 %idx = #x0000000000000004 (4)

Source:
<2 x ptr> %gep = < poison, pointer(non-local, block_id=1, offset=4) >
<2 x i1> %cnd = < poison, #x0 (0) >

SOURCE MEMORY STATE
===================
NON-LOCAL BLOCKS:
Block 0 >       size: 0 align: 1        alloc type: 0   alive: false    address: 0
Block 1 >       size: 8 align: 1        alloc type: 0   alive: true     address: 1
Block 2 >       size: 3 align: 4        alloc type: 0   alive: true     address: 12

Target:
<2 x i1> %cnd = < #x1 (1), #x1 (1) >
Source value: < poison, #x0 (0) >
Target value: < #x1 (1), #x1 (1) >

// Transforms/InstCombine/getelementptr.ll -- seems like a bug in Alive2

dtcxzyw commented 2 weeks ago

I am not willing to fix these bugs since adding a bunch of isGuaranteedNotToBeUndef checks may be expensive and a burden of maintenance. BTW, have we encountered these bugs in real-world cases?

nikic commented 2 weeks ago

@dtcxzyw We have encountered a few multi-use undef miscompiles related to select conditions in the wild. Not sure whether we hit any that did not involve selects.

dtcxzyw commented 2 weeks ago

@dtcxzyw We have encountered a few multi-use undef miscompiles related to select conditions in the wild. Not sure whether we hit any that did not involve selects.

IMO, as this is not a serious issue, we should spend more time removing the undef support. We have given up on correctness of handling undef in the backend.

goldsteinn commented 1 week ago

IMO, as this is not a serious issue, we should spend more time removing the undef support. We have given up on correctness of handling undef in the backend.

Is there a roadmap to entirely drop undef support?

nunoplopes commented 1 week ago

IMO, as this is not a serious issue, we should spend more time removing the undef support. We have given up on correctness of handling undef in the backend.

Is there a roadmap to entirely drop undef support?

There isn't an official plan nor any concerted effort (but there could be). A few of us have been slowly doing bits and pieces. What we have done so far is:

What's left?

dtcxzyw commented 1 week ago

Add an auto-migration path to replace undef with zero

It looks ok as it even improves both performance and code-size :)