google / souper

A superoptimizer for LLVM IR
Apache License 2.0
2.18k stars 171 forks source link

blockpc is unsound #166

Open regehr opened 9 years ago

regehr commented 9 years ago

The behavior of the test case below is changed by Souper even with UB turned off. Of course it could be an LLVM bug exposed by Souper, but we need to take a look.

$ cat small.c
int printf(const char *, ...);
int x0, x1, x2, x3, x4, x5;
int main() {
  if (x1 <= 0 == 0) {
    x2 = x5 = x0 >= 2 ?: 0 < x0;
    if (x1 >= x5)
      x3 = 0;
  }
  x4 = x2;
  x1 || printf("x\n");
  return 0;
}
$ /home/regehr/f/souper-regehr/third_party/llvm/Release/bin/clang -O3 small.c -o small2 -Xclang -load -Xclang /home/regehr/f/souper-regehr/build/libsouperPass.so -mllvm -z3-path=/usr/bin/z3 -mllvm -solver-timeout=15 -mllvm -souper-debug -mllvm -souper-external-cache -mllvm -souper-exploit-ub=false 

; Listing applied replacements for main
; Using solver: Z3 + external cache + internal cache

; Listing applied replacements for main
; Using solver: Z3 + external cache + internal cache

; Listing applied replacements for main
; Using solver: Z3 + external cache + internal cache

; Replacing "  %12 = icmp slt i32 %1, %11"
; from ""
; with "i1 false" in:
%0 = block 2
%1:i32 = var
%2:i1 = slt %1, 1:i32
blockpc %0 0 %2 0:i1
%3:i32 = var
%4:i1 = slt 1:i32, %3
blockpc %0 0 %4 0:i1
blockpc %0 1 %2 0:i1
%5:i1 = slt 0:i32, %3
%6:i32 = zext %5
%7:i32 = zext %4
%8:i32 = phi %0, %6, %7
%9:i1 = slt %1, %8
cand %9 0:i1

; Replacing "  %15 = icmp eq i32 %1, 0"
; from ""
; with "i1 false" in:
%0 = block 3
%1:i32 = var
%2:i1 = slt %1, 1:i32
blockpc %0 0 %2 1:i1
%3 = block 2
blockpc %3 0 %2 0:i1
blockpc %3 1 %2 0:i1
%4:i32 = var
%5:i1 = slt 0:i32, %4
%6:i32 = zext %5
%7:i1 = slt 1:i32, %4
%8:i32 = zext %7
%9:i32 = phi %3, %6, %8
%10:i1 = slt %1, %9
blockpc %0 2 %10 0:i1
%11:i1 = eq 0:i32, %1
cand %11 0:i1

; Listing applied replacements for main
; Using solver: Z3 + external cache + internal cache

; Listing applied replacements for main
; Using solver: Z3 + external cache + internal cache
$ ./small2
$ clang small.c -o small2
$ ./small2
x
jeroenk commented 9 years ago

I tried looking at this, but I have a very hard time relating

%0 = block 2
%1:i32 = var
%2:i1 = slt %1, 1:i32
blockpc %0 0 %2 0:i1
%3:i32 = var
%4:i1 = slt 1:i32, %3
blockpc %0 0 %4 0:i1
blockpc %0 1 %2 0:i1
%5:i1 = slt 0:i32, %3
%6:i32 = zext %5
%7:i32 = zext %4
%8:i32 = phi %0, %6, %7
%9:i1 = slt %1, %8

to the only fragment of IR it seems to map to:

entry:
  %0 = load i32* @x1, align 4, !tbaa !2
  %cmp = icmp slt i32 %0, 1
  br i1 %cmp, label %entry.if.end10_crit_edge, label %if.then

if.then:                                          ; preds = %entry
  %1 = load i32* @x0, align 4, !tbaa !2
  %cmp3 = icmp sgt i32 %1, 1
  %conv4 = zext i1 %cmp3 to i32
  br i1 %cmp3, label %cond.end, label %cond.false

cond.false:                                       ; preds = %if.then
  %cmp5 = icmp sgt i32 %1, 0
  %conv6 = zext i1 %cmp5 to i32
  br label %cond.end

cond.end:                                         ; preds = %cond.false, %if.then
  %cond = phi i32 [ %conv6, %cond.false ], [ %conv4, %if.then ]
  store i32 %cond, i32* @x5, align 4, !tbaa !2
  store i32 %cond, i32* @x2, align 4, !tbaa !2
  %cmp7 = icmp slt i32 %0, %cond

The phi node corresponds to the branch in the if.then block, which check the value of %cmp3. This can only correspond to %4:i1 = slt 1:i32, %3 in the souper code. Yet, only one of the blockpc's that follows %4 refers to this value. I would actually expect something like:

%0 = block 2
%1:i32 = var
%2:i1 = slt %1, 1:i32
pc %2 1:i1
%3:i32 = var
%4:i1 = slt 1:i32, %3
blockpc %0 0 %4 0:i1
blockpc %0 1 %4 1:i1
%5:i1 = slt 0:i32, %3
%6:i32 = zext %5
%7:i32 = zext %4
%8:i32 = phi %0, %6, %7
%9:i1 = slt %1, %8

But maybe I'm wrong and simply don't sufficiently understand blockpc's.

regehr commented 9 years ago

Here's the critical transformation. I also have a hard time with blockpc, but it's a really valuable thing.

*** IR Dump After Combine redundant instructions ***
; Function Attrs: nounwind uwtable
define i32 @main() #0 {
  %1 = load i32* @x1, align 4, !tbaa !1
  %2 = icmp slt i32 %1, 1
  br i1 %2, label %._crit_edge, label %3

._crit_edge:                                      ; preds = %0
  %.pre = load i32* @x2, align 4, !tbaa !1
  br label %14

; <label>:3                                       ; preds = %0
  %4 = load i32* @x0, align 4, !tbaa !1
  %5 = icmp sgt i32 %4, 1
  %6 = zext i1 %5 to i32
  br i1 %5, label %10, label %7

; <label>:7                                       ; preds = %3
  %8 = icmp sgt i32 %4, 0
  %9 = zext i1 %8 to i32
  br label %10

; <label>:10                                      ; preds = %3, %7
  %11 = phi i32 [ %9, %7 ], [ %6, %3 ]
  store i32 %11, i32* @x5, align 4, !tbaa !1
  store i32 %11, i32* @x2, align 4, !tbaa !1
  %12 = icmp slt i32 %1, %11
  br i1 %12, label %14, label %13

; <label>:13                                      ; preds = %10
  store i32 0, i32* @x3, align 4, !tbaa !1
  br label %14

; <label>:14                                      ; preds = %._crit_edge, %10, %13
  %15 = phi i32 [ %11, %10 ], [ %.pre, %._crit_edge ], [ %11, %13 ]
  store i32 %15, i32* @x4, align 4, !tbaa !1
  %16 = icmp eq i32 %1, 0
  br i1 %16, label %17, label %18

; <label>:17                                      ; preds = %14
  %puts = tail call i32 @puts(i8* getelementptr inbounds ([2 x i8]* @str, i64 0, i64 0))
  br label %18

; <label>:18                                      ; preds = %14, %17
  ret i32 0
}

; Listing applied replacements for main
; Using solver: Z3 + external cache + internal cache

; Replacing "  %12 = icmp slt i32 %1, %11"
; from ""
; with "i1 false" in:
%0 = block 2
%1:i32 = var
%2:i1 = slt %1, 1:i32
blockpc %0 0 %2 0:i1
%3:i32 = var
%4:i1 = slt 1:i32, %3
blockpc %0 0 %4 0:i1
blockpc %0 1 %2 0:i1
%5:i1 = slt 0:i32, %3
%6:i32 = zext %5
%7:i32 = zext %4
%8:i32 = phi %0, %6, %7
%9:i1 = slt %1, %8
cand %9 0:i1

; Replacing "  %15 = icmp eq i32 %1, 0"
; from ""
; with "i1 false" in:
%0 = block 2
%1:i32 = var
%2:i1 = slt %1, 1:i32
blockpc %0 0 %2 0:i1
blockpc %0 1 %2 0:i1
%3 = block 3
blockpc %3 1 %2 1:i1
%4:i32 = var
%5:i1 = slt 0:i32, %4
%6:i32 = zext %5
%7:i1 = slt 1:i32, %4
%8:i32 = zext %7
%9:i32 = phi %0, %6, %8
%10:i1 = slt %1, %9
blockpc %3 2 %10 0:i1
%11:i1 = eq 0:i32, %1
cand %11 0:i1
*** IR Dump After Souper super-optimizer pass ***
; Function Attrs: nounwind uwtable
define i32 @main() #0 {
  %1 = load i32* @x1, align 4, !tbaa !1
  %2 = icmp slt i32 %1, 1
  br i1 %2, label %._crit_edge, label %3

._crit_edge:                                      ; preds = %0
  %.pre = load i32* @x2, align 4, !tbaa !1
  br label %13

; <label>:3                                       ; preds = %0
  %4 = load i32* @x0, align 4, !tbaa !1
  %5 = icmp sgt i32 %4, 1
  %6 = zext i1 %5 to i32
  br i1 %5, label %10, label %7

; <label>:7                                       ; preds = %3
  %8 = icmp sgt i32 %4, 0
  %9 = zext i1 %8 to i32
  br label %10

; <label>:10                                      ; preds = %3, %7
  %11 = phi i32 [ %9, %7 ], [ %6, %3 ]
  store i32 %11, i32* @x5, align 4, !tbaa !1
  store i32 %11, i32* @x2, align 4, !tbaa !1
  br i1 false, label %13, label %12

; <label>:12                                      ; preds = %10
  store i32 0, i32* @x3, align 4, !tbaa !1
  br label %13

; <label>:13                                      ; preds = %._crit_edge, %10, %12
  %14 = phi i32 [ %11, %10 ], [ %.pre, %._crit_edge ], [ %11, %12 ]
  store i32 %14, i32* @x4, align 4, !tbaa !1
  br i1 false, label %15, label %16

; <label>:15                                      ; preds = %13
  %puts = tail call i32 @puts(i8* getelementptr inbounds ([2 x i8]* @str, i64 0, i64 0))
  br label %16

; <label>:16                                      ; preds = %13, %15
  ret i32 0
}
chenyang78 commented 9 years ago

blockpc asserts path conditions that must be true for a given branch. For example, given the following llvm bitcode:

entry:
    %0 = load i32* @x1, align 4, !tbaa !2
    %cmp = icmp slt i32 %0, 1
    br i1 %cmp, label %entry.if.end10_crit_edge, label %if.then

if.then:                                          ; preds = %entry
    %1 = load i32* @x0, align 4, !tbaa !2
    %cmp3 = icmp sgt i32 %1, 1
    %conv4 = zext i1 %cmp3 to i32
    br i1 %cmp3, label %cond.end, label %cond.false

cond.false:                                       ; preds = %if.then
    %cmp5 = icmp sgt i32 %1, 0
    %conv6 = zext i1 %cmp5 to i32
    br label %cond.end

cond.end:                                         ; preds = %cond.false, %if.then
    %cond = phi i32 [ %conv6, %cond.false ], [ %conv4, %if.then ]
    store i32 %cond, i32* @x5, align 4, !tbaa !2
    store i32 %cond, i32* @x2, align 4, !tbaa !2
    %cmp7 = icmp slt i32 %0, %cond

We start collecting blockpc(s) at the phi instruction %cond. We first look at the first incoming block, i.e., %cond.false. Because %cond.false has single predecessor %if.then and this predecessor's terminator is a conditional instruction, we add one blockpc as blockpc 0 %0 %cmp3 0. Then we examine the predecessor of %if.then. Similarly, because %if.then has single predecessor %entry, which has a conditional terminator, we add another blockpc blockpc %0 0 %cmp 0.

Now we finish the path starting from %cond.false, and we look at %cond's second incoming block %if.then. Similarly, we add blockpc %0 1 %cmp 0.

regehr commented 9 years ago

Does the optimization look correct to you Yang? If so, then LLVM must be wrong somewhere else. I'm not sure how useful bugpoint is for wrong code bugs, anyone know? Otherwise we can do something less smart like cut-and-pasting stuff from the print-after-all output into lli.

chenyang78 commented 9 years ago

@jeroenk For the souper IR that you gave, we cannot have pc, because we don't utilize context-sensitive control-flow information. In other words, we don't know the fact that %cond.false and %if.then come from the same predecessor %entry with the same branch condition. IMHO, this is probably one of the limitations of our current pc/blockpc harvesting strategy.

chenyang78 commented 9 years ago

@regehr, I will investigate it tonight.

regehr commented 9 years ago

Thanks Yang. Full optimization log is here if that's helpful.

http://pastebin.com/kHLxjv2A

chenyang78 commented 9 years ago

The second optimization looks invalid to me.

%0 = block 2
%1:i32 = var
%2:i1 = slt %1, 1:i32
blockpc %0 0 %2 0:i1
blockpc %0 1 %2 0:i1
%3 = block 3
blockpc %3 1 %2 1:i1
%4:i32 = var
%5:i1 = slt 0:i32, %4
%6:i32 = zext %5
%7:i1 = slt 1:i32, %4
%8:i32 = zext %7
%9:i32 = phi %0, %6, %8
%10:i1 = slt %1, %9
blockpc %3 2 %10 0:i1
%11:i1 = eq 0:i32, %1
cand %11 0:i1

Here we have %10 == 0, which means that %1 < %9 is false, i.e., %1 >= %9. Because %9 can be either 0 or 1, %1 may equal to 0. So, there is some bug in generating blockpc-related queries. Investigating.

chenyang78 commented 9 years ago

Generated KLEE expression. Pondering if it's correct.

  (Or (Eq false
          (And (Or (Eq false (And N0:(Read 0 blockpred) N0))
                   N1:(Eq false (Slt N2:(Read w32 0 a1) 1)))
               (Or N0 N1)))
      (Eq false (Eq 0 N2)))
jeroenk commented 9 years ago

It's correct, but not sufficient. The problem is that the N0 expression is not being guarded by anything. Hence, if we let

(Slt (Read w32 0 a1) 1)

evaluate to true, which happens when you follow the control flow from entry -> ._crit_edge -> 14 (referring to John's dump above), then you essentially end up with

(And (Eq false N0) N0)

for the (And ...) sub-expression, which can never be satisfied. This is perfectly fine, as we know that the branch on which N0 is needed is not the one being taken. However, this latter fact is not being encoded in the formula.

What exposes the problem here is that in addition to this we have that if

(Slt (Read w32 0 a1) 1)

evaluates to false, then we have a rightful simplification (which is essentially the first simplification souper spots).

regehr commented 9 years ago

I just want to add that if there is a Souper bug here, it's a really hard one to find using Csmith, coming up only once every 12 hours or so on a fast quad core. Next time we might not be so lucky, so let's be really careful about this code.

chenyang78 commented 9 years ago

@jeroenk , Thanks for the explanation! I don't quite understand this part: how can we get

(And (Eq false N0) N0)

by assuming

(Slt (Read w32 0 a1) 1)

to be true. Can you elaborate on that? Thanks.

chenyang78 commented 9 years ago

OK, I see the problem now. Actually the bug relates to the part where we generate souper IR. The phi node relevant to %3 = block is not generated. If we look at the second IR in John's example, we only have %3 = block and blockpc %3, but the corresponding phi instruction doesn't exist. Looks like the way that we compute the equivalence classes for blockpcs has some problem. Working on a fix.

jeroenk commented 9 years ago

@chenyang78 I assume you figured it out by now, but just for completeness, we have:

(And (Or (Eq false (And N0:(Read 0 blockpred) N0))
                   N1:(Eq false (Slt N2:(Read w32 0 a1) 1)))
               (Or N0 N1)))

By (And x x) = x this is equal to:

(And (Or (Eq false N0:(Read 0 blockpred) )
                   N1:(Eq false (Slt N2:(Read w32 0 a1) 1)))
               (Or N0 N1)))

then assuming (Slt (Read w32 0 a1) 1) we get:

(And (Or (Eq false N0:(Read 0 blockpred) )
                   N1:(Eq false true))
               (Or N0 N1)))

and by (Eq false true) = false:

(And (Or (Eq false N0:(Read 0 blockpred) )
                   N1:false)
               (Or N0 N1)))

Substituting N1:

(And (Or (Eq false N0:(Read 0 blockpred) )
                   false)
               (Or N0 false)))

and simplifying (Or x false) to x we get:

(And (Eq false N0:(Read 0 blockpred) )
               N0))
chenyang78 commented 9 years ago

@jeroenk Thanks!

regehr commented 9 years ago

First of 5 problems found by Csmith in a 24-hour run using Yang's issue_166 branch.

$ SOUPER_NO_SOUPER=1 /home/regehr/f/souper-yang/build/sclang -O2 small.c -o small1 ; ./small1
y
$ gcc -O2 small.c -o small1 ; ./small1
y
$ /home/regehr/f/souper-yang/build/sclang -O2 small.c -o small1 ; ./small1
x
$ cat small.c
int printf(const char *, ...);
int x0, x1, x2, x3;
int main() {
  if (x3) {
    x1 = x3;
    for (; x0 <= 0; x0++)
      ;
  }
  x2 = x1 |= x0;
  if (x2)
    printf("x\n");
  else
    printf("y\n");
  return 0;
}
regehr commented 9 years ago
$ cat small.c
int printf(const char *, ...);
char x0;
int x1 = 1;
int x2() { return x0 ?: 1 % x0; }

int main() {
  if ((x1 || x2()) ^ 0) {
    if (x1 < 1) {
      ;
    } else {
      printf("x\n");
    }
  }
  return 0;
}
$ /home/regehr/f/souper-yang/build/sclang -O2 small.c -o small1 ; ./small1
$ /home/regehr/f/souper-yang/build/sclang -O0 small.c -o small1 ; ./small1
x
$ gcc -O2 small.c -o small1 ; ./small1
x
regehr commented 9 years ago

Program from the previous comment is a good one since it results in only one souper optimization.

%0 = block 2
%1:i32 = var
%2:i1 = eq 0:i32, %1
blockpc %0 0 %2 1:i1
%3:i8 = var
%4:i1 = eq 0:i8, %3
blockpc %0 0 %4 1:i1
blockpc %0 1 %2 1:i1
%5 = block 2
%6:i32 = sext %3
%7:i32 = srem 1:i32, %6
%8:i32 = phi %0, %7, %6
%9:i1 = eq 0:i32, %8
%10:i1 = phi %5, 0:i1, %9
%11:i1 = slt %1, 1:i32
%12:i1 = or %10, %11
cand %12 1:i1
regehr commented 9 years ago
int printf(const char *, ...);
int x0, x1, x2, x3;
volatile int x4;
void x5(x6) {
  x1 = 0;
  for (; x1 >= 0; x1--)
    if (x4) {
      x6 = x0;
      if (x6)
        x2 = 0;
    }
  x3 = x6 >= 1;
}

int main() {
  x5(0);
  printf("%x", x3);
  return 0;
}
regehr commented 9 years ago
int printf(const char *, ...);
int x0, x1, x2, x3, x4, x5;
char x6;
int main() {
  x6 = x4;
  if (x6) {
    x5 = x1 == 0 ?: x4 % x1;
    if (x5)
      x4 = 0;
  } else {
    if (x0)
      x4 = 9;
    x3--;
  }
  x2 = 1 <= x4;
  printf("%x", x2);
  return 0;
}
regehr commented 9 years ago
int printf(const char *, ...);
int x0, x1, x2, x3, x4, x5;
short x6;
void x7() {
  x2 = 5;
  for (; x2; x2--) {
    x6 = x4 ?: x1 > x4;
    x5 = x6 == 0 ?: x0 / x6;
    if (x5)
      break;
  }
}

int main() {
  x7();
  if (x4)
    for (; x3;)
      printf("x\n");
  return 0;
}
chenyang78 commented 9 years ago

I will investigate those. Thanks, John.

chenyang78 commented 9 years ago

OK, I see Souper's issue for your second program, i.e,

int printf(const char *, ...);
char x0;
int x1 = 1;
int x2() { return x0 ?: 1 % x0; }

int main() {
  if ((x1 || x2()) ^ 0) {
    if (x1 < 1) {
      ;
    } else {
      printf("x\n");
    }
  }
  return 0;
}

Here Souper wrongfully exploited the "UB" in the function x2. The program is UB-free, but Souper is not aware of the fact that x2() will never get invoked. The following simplified souper IR illustrates the problem:

%0 = block 2
%3:i8 = var
%4:i1 = eq 0:i8, %3
blockpc %0 0 %4 1:i1
%6:i32 = sext %3
%7:i32 = srem 1:i32, %6
%8:i32 = phi %0, %7, %6
...

blockpc %0 0 %4 1:i1 asserts that %3 is 0, and therefore %6 is 0. Consequently, we get UB for instruction %7.

regehr commented 9 years ago

I believe that all of these problems occur when -souper-exploit-ub=true but please double check.

regehr commented 9 years ago

Ok I double checked and this optimization does happen when UB is turned off:

$ ~/f/souper-yang/build/souper -souper-exploit-ub=false -stp-path=/usr/local/bin/stp small.bc
; Listing valid replacements.
; Using solver: STP + internal cache

; Static profile 1
; Function: main
%0 = block 2
%1:i32 = var
%2:i1 = eq 0:i32, %1
blockpc %0 0 %2 1:i1
%3:i8 = var
%4:i1 = eq 0:i8, %3
blockpc %0 0 %4 1:i1
blockpc %0 1 %2 1:i1
%5 = block 2
%6:i32 = sext %3
%7:i32 = srem 1:i32, %6
%8:i32 = phi %0, %7, %6
%9:i1 = eq 0:i32, %8
%10:i1 = phi %5, 0:i1, %9
%11:i1 = slt %1, 1:i32
%12:i1 = or %10, %11
cand %12 1:i1
regehr commented 9 years ago

Here's the llvm (which looks correct):

; ModuleID = 'small.c'
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

@x1 = global i32 1, align 4
@x0 = common global i8 0, align 1
@str = private unnamed_addr constant [2 x i8] c"x\00"

; Function Attrs: nounwind readonly uwtable
define i32 @x2() #0 {
  %1 = load i8* @x0, align 1, !tbaa !1
  %2 = sext i8 %1 to i32
  %3 = icmp eq i8 %1, 0
  br i1 %3, label %4, label %6

; <label>:4                                       ; preds = %0
  %5 = srem i32 1, %2
  br label %6

; <label>:6                                       ; preds = %0, %4
  %7 = phi i32 [ %5, %4 ], [ %2, %0 ]
  ret i32 %7
}

; Function Attrs: nounwind uwtable
define i32 @main() #1 {
  %1 = load i32* @x1, align 4, !tbaa !4
  %2 = icmp eq i32 %1, 0
  br i1 %2, label %3, label %10

; <label>:3                                       ; preds = %0
  %4 = load i8* @x0, align 1, !tbaa !1
  %5 = sext i8 %4 to i32
  %6 = icmp eq i8 %4, 0
  br i1 %6, label %7, label %x2.exit

; <label>:7                                       ; preds = %3
  %8 = srem i32 1, %5
  br label %x2.exit

x2.exit:                                          ; preds = %3, %7
  %9 = phi i32 [ %8, %7 ], [ %5, %3 ]
  %phitmp = icmp eq i32 %9, 0
  br label %10

; <label>:10                                      ; preds = %0, %x2.exit
  %11 = phi i1 [ false, %0 ], [ %phitmp, %x2.exit ]
  %12 = icmp slt i32 %1, 1
  %or.cond = or i1 %11, %12
  br i1 %or.cond, label %14, label %13

; <label>:13                                      ; preds = %10
  %puts = tail call i32 @puts(i8* getelementptr inbounds ([2 x i8]* @str, i64 0, i64 0))
  br label %14

; <label>:14                                      ; preds = %10, %13
  ret i32 0
}

; Function Attrs: nounwind
declare i32 @puts(i8* nocapture readonly) #2

attributes #0 = { nounwind readonly uwtable "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #1 = { nounwind uwtable "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #2 = { nounwind }

!llvm.ident = !{!0}

!0 = metadata !{metadata !"clang version 3.6.0 (trunk 223586)"}
!1 = metadata !{metadata !2, metadata !2, i64 0}
!2 = metadata !{metadata !"omnipotent char", metadata !3, i64 0}
!3 = metadata !{metadata !"Simple C/C++ TBAA"}
!4 = metadata !{metadata !5, metadata !5, i64 0}
!5 = metadata !{metadata !"int", metadata !2, i64 0}
regehr commented 9 years ago

Below is a partial annotation of the Souper. It looks to me like the blockpcs are telling us that x1 == 0 regardless of which branch was taken. Is that correct, Yang?

%0 = block 2
%5 = block 2

%1:i32 = var               ; x1
%2:i1 = eq 0:i32, %1       ; x1 == 0
%3:i8 = var                ; x0
%4:i1 = eq 0:i8, %3        ; x0 == 0
%6:i32 = sext %3           ; x0 promoted to int
%7:i32 = srem 1:i32, %6    ; 1 % x0

blockpc %0 0 %2 1:i1       ; x1 == 0 if pred 0
blockpc %0 1 %2 1:i1       ; x1 == 0 if pred 1
blockpc %0 0 %4 1:i1       ; x0 == 0 if pred 0
%8:i32 = phi %0, %7, %6    ;

%9:i1 = eq 0:i32, %8       ;
%10:i1 = phi %5, 0:i1, %9
%11:i1 = slt %1, 1:i32     ; x1 < 1
%12:i1 = or %10, %11       ; 
cand %12 1:i1
jeroenk commented 9 years ago

Going by Yang's earlier explanation, I think this says that for %8:i32 = phi %0, %7, %6, we may assume that:

regehr commented 9 years ago

Jeroen that is my understanding as well. The problem here, of course, is that x1 != 0.

jeroenk commented 9 years ago

But, %8:i32 = phi %0, %7, %6 corresponds to %9 = phi i32 [ %8, %7 ], [ %5, %3 ] in the bitcode, which is only reachable in case x1 == 0. Or am I missing something?

regehr commented 9 years ago

Jeroen, I think you're right. So I'm still not sure what's going wrong here.

One thing we should consider is what the no-UB mode does when there is a divide by zero. My understanding is that division by zero is permitted in SMTLIB2 code but that the result is unspecified. Would that be contributing to the problem here?

chenyang78 commented 9 years ago

Thanks, John and Jeroen. I will take a close look at it tonight.

regehr commented 9 years ago

I'm pretty sure the divide by zero isn't the problem...

rsas commented 9 years ago

Div by zero records UB if the second op is a constant 0. In turn, KLEEBuilder returns constant 0 as the result. If any path conditions constrain the second op to be 0, solver doesn't complain. I did the following test:

%0:i32 = var
pc %0 0:i32
%1:i32 = udiv 1:i32, %0
cand %1 1:i32

$ souper-check $SOUPER_SOLVER -souper-exploit-ub=1 divbyzero.opt 
; LGTM

$ souper-check $SOUPER_SOLVER -souper-exploit-ub=0 divbyzero.opt 
Invalid, e.g.

%0 = 0
jeroenk commented 9 years ago

@regehr Bitvector division in SMTLib is a total function, so anything divided by zero will have a well-defined answer. However, the solver is not allowed to make any assumptions about the result of the function in case the divisor is 0.

For a program in which the division by zero cannot actually be executed, I would expect the SMT query that is generated to be such that the value of the division does not matter for the satisfiability of the query.

rsas commented 9 years ago

Thanks @jeroenk. I ran the test above with z3. stp says in both cases LGTM.

jeroenk commented 9 years ago

The KLEE expression is as follows (writing A ==> B for (Or (Eq false A) B)) and with some trivial simplification:

(And (N0:(Read 0 blockpred0) ==> (And N1:(Eq 0 N2:(Read w32 0 a1)) (Eq 0 N3:(Read w8 0 a3))))
     (Or N0 N1))

    ==>

(Or (And (Eq false (Read 0 blockpred)) (Eq 0 (Select w32 N0 (SRem w32 1 N4:(SExt w32 N3)) N4)))
    (Slt N2 1))

Here, N0 represents the blockpc, N2 represents %1 (or x1 in the original C code), and N3 represents %3 (or x0 in the original). For the optimisation to hold the expression must be valid, i.e., whatever values I pick for N0, N1, and N2, the result must be true. Of course, this is not what we want in this case.

Let's concentrate on the case that N2 != 0 (I believe that the N2 == 0 case is properly encoded by the expression). If N2 != 0, the N1 == false, and the expression simplifies to:

(And (N0:(Read 0 blockpred0) ==> false)
        N0)

    ==>

(Or (And (Eq false (Read 0 blockpred)) (Eq 0 (Select w32 N0 (SRem w32 1 N4:(SExt w32 N3)) N4)))
    (Slt N2 1))

The second half of the formula simplifies depending on N2 >= 1, or N2 < 0, but this doesn't really matter, as

(And (N0:(Read 0 blockpred0) ==> false) N0)

is always false. And, hence, the whole formula is true if N2 != 0. So, N0 affects the validity of the expression even in case N2 != 0, and this should not happen, as it's related to the execution path we didn't choose in this case! It seems that there's something missing from the generated KLEE expression, but I'm not totally sure what that is (needs more thought).

rsas commented 9 years ago

I think I'm missing here something, because for me the souper optimization output looks correct. Regardless the branch of %8:i32 = phi %0, %7, %6, %1 is always constrained to 0 (blockpcs are equal for both branches). Consequently, %11 is always 1 and herewith cand %12 1:i1 is correct. In other words, the optimization is equivalent to the opt with all blockpcs replaced with a single pc %2 1:i1. What is wrong with my reasoning?

regehr commented 9 years ago

Raimondas, if you can't find a bug in the Souper optimization, there must be a harvesting bug, right?

jeroenk commented 9 years ago

@rsas Naive question: wouldn't constraining %1 to 0 require blockpc's that refer to %5?

rsas commented 9 years ago

Yes, if others prove me wrong. I don't see a bug in the Souper output and I see a problem in the LLVM as follows:

The first branching instruction br i1 %2, label %3, label %10 can jump to label %10 if %1 != 0. I hope we all agree here.

At label 10, the execution of %or.cond = or i1 %11, %12 is constrained by blockpcs (collected through %11) which in turn constrain %1. This is not ok, because there are no constraints on %1 when coming from the initial false branch and jumping to label 10.

rsas commented 9 years ago

In summary, %12 can be 0 or 1 when coming from the initial false branch. On that path, there are no constraints on %1!

chenyang78 commented 9 years ago

Thanks guys. @rsas, I think you are right. Looks like it relates to issue #157. We shouldn't skip constraints on branches, which can result in both missing-optimization and mis-optimization.

rsas commented 9 years ago

@jeroenk can you please rephrase the question?

jeroenk commented 9 years ago

@rsas Rephrasing: I think I'm fundamentally confused about what a list of souper instructions is. From the documentation and all previous discussions I got the impression that there's some importance in the order of the instructions, but following your reasoning above, it seems that I should view as a set and not as a list. We write it as a list to conveniently express some non-circular dependencies that may occur.

rsas commented 9 years ago

Correct. You always start from the bottom cand instruction, collect all constraints recursively and add them at the top level.

chenyang78 commented 9 years ago

OK, I think the fundamental problem here is that we are not taking control-flow facts into account. For the given example, two blockpcs on %2 cannot be viewed as a pure "union". Thinking about a way to encode control-flow info...

vinodgro commented 9 years ago

I think so too. I suspect you'll need to encode both data flow and control flow information into blockpc.

On Mon, Feb 9, 2015 at 6:27 PM, Yang Chen notifications@github.com wrote:

OK, I think the fundamental problem here is that we are not taking control-flow facts into account. For the given example, two blockpcs on %2 cannot be viewed as a pure "union". Thinking about a way to encode control-flow info...

— Reply to this email directly or view it on GitHub https://github.com/google/souper/issues/166#issuecomment-73633209.

rsas commented 9 years ago

@chenyang78 I think it's time for us to rethink the design of pcs/blockpcs/UBs/Phis. They all have things in common and the code is not trivial at all.

rsas commented 9 years ago

At the same time, we have to encode poison values too.