llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
27.85k stars 11.47k forks source link

sanitize-coverage: incorrectly instruments logical operations #33785

Open a8b4922b-9ceb-405e-9794-fa0271302cc2 opened 7 years ago

a8b4922b-9ceb-405e-9794-fa0271302cc2 commented 7 years ago
Bugzilla Link 34437
Version trunk
OS Linux
CC @KernelAddress,@ramosian-glider

Extended Description

clang version 5.0.0 (trunk 303084)

Program:

void bad(); void foo(int x, int y) { if (x == 0xdead && y == 0xbeef) bad(); } int bar(int x, int y) { return x == 0xdead && y == 0xbeef; }

$ clang test.c -O2 -c -S -o - -fsanitize-coverage=trace-pc

foo:

    movl    %esi, %ebx
    movl    %edi, %ebp
    callq   __sanitizer_cov_trace_pc
    #APP
    #NO_APP
    cmpl    $57005, %ebp            # imm = 0xDEAD
    jne     .LBB0_2

BB#1: # %entry

    cmpl    $48879, %ebx            # imm = 0xBEEF
    jne     .LBB0_2

BB#3: # %if.then

    callq   __sanitizer_cov_trace_pc
    #APP
    #NO_APP
    xorl    %eax, %eax
    addq    $8, %rsp
    popq    %rbx
    popq    %rbp
    jmp     bad                     # TAILCALL

.LBB0_2: # %if.end callq __sanitizer_cov_trace_pc

APP

    #NO_APP
    addq    $8, %rsp
    popq    %rbx
    popq    %rbp
    retq

.Lfunc_end0: .size foo, .Lfunc_end0-foo .cfi_endproc

bar:

    movl    %esi, %ebx
    movl    %edi, %ebp
    callq   __sanitizer_cov_trace_pc
    #APP
    #NO_APP
    xorl    $57005, %ebp            # imm = 0xDEAD
    xorl    $48879, %ebx            # imm = 0xBEEF
    xorl    %eax, %eax
    orl     %ebp, %ebx
    sete    %al
    addq    $8, %rsp
    popq    %rbx
    popq    %rbp
    retq

For foo there must be an additional callback at BB#1, for bar there must be an additional callback somewhere.

This has negative effect on coverage-guided fuzzers as they need to guess N values simultaneously in order to make progress, rather then guess values one-by-one and persist progress.

kcc commented 6 years ago

But my measurements were never "scientific" -- I am waiting for our A/B testing infra.

Will it be able to account for difference in performance between -O2 and -O0?

I hope so. There are two major metrics: time and # of iterations. When comparing O2 vs O0, time is less useful, but we still have # of iterations

If we get better coverage, but make it 10x slower, it's unsurprising that it will be worse in a limited-time run. Fixed-time run between -O2 and -O0 won't be apples to apples.

a8b4922b-9ceb-405e-9794-fa0271302cc2 commented 6 years ago

I would be surprised if short circuiting for code like if (p && p->x) happens frequently. (roughly speaking) it may only happen if the operands are side-effect-free.

That's exactly what I mean. These additional cases are not if (p && p->x), that's something more like if (x && y).

But my measurements were never "scientific" -- I am waiting for our A/B testing infra.

Will it be able to account for difference in performance between -O2 and -O0? If we get better coverage, but make it 10x slower, it's unsurprising that it will be worse in a limited-time run. Fixed-time run between -O2 and -O0 won't be apples to apples.

kcc commented 6 years ago

I've applied this change (which seems to be the minimal change that fixes the reproducer):

If you can make change with a proper flag (and if you think it's useful) we can enable such flag under -fsanitize-coverage=*

Index: lib/Transforms/Utils/SimplifyCFG.cpp

--- lib/Transforms/Utils/SimplifyCFG.cpp (revision 312156) +++ lib/Transforms/Utils/SimplifyCFG.cpp (working copy) @@ -298,7 +298,7 @@ const TargetTransformInfo &TTI) { assert(isSafeToSpeculativelyExecute(I) && "Instruction is not safe to speculatively execute!");

  • return TTI.getUserCost(I);
  • return TargetTransformInfo::TCC_Expensive; }

    /// If we have a merge point of an "if condition" as accepted above, @@ -5744,6 +5744,8 @@ }

    bool SimplifyCFGOpt::SimplifyCondBranch(BranchInst *BI, IRBuilder<> &Builder) {

  • return false;
  • BasicBlock *BB = BI->getParent();

For foo it changes number of coverage callbacks from 3 to 4, and for bar -- from 1 to 3.

For Linux kernel it adds 6.8% of coverage callbacks (401279->428496). And I would expect most of that is very useful signal for fuzzing: complex, speculatable logical conditions (i.e. not if (p && p->x)).

I would be surprised if short circuiting for code like if (p && p->x) happens frequently. (roughly speaking) it may only happen if the operands are side-effect-free.

which is why I was experimenting with clang coverage recently.

Have you considered:

  1. Disabling/enabling some transformations in llvm (like the change above)

I tried running with -O0 a few times to see if this helps (precisely for this reason). On small puzzles it definitely helps, but on larger benchmarks and on longer runs I did not see any benefit. But my measurements were never "scientific" -- I am waiting for our A/B testing infra.

  1. Running before SimplifyCFG

No tried.

?

Both look like reasonable options (no need to write another instrumentation on AST level, still have access to powerful llvm analysis).

a8b4922b-9ceb-405e-9794-fa0271302cc2 commented 6 years ago

I've applied this change (which seems to be the minimal change that fixes the reproducer):

Index: lib/Transforms/Utils/SimplifyCFG.cpp

--- lib/Transforms/Utils/SimplifyCFG.cpp (revision 312156) +++ lib/Transforms/Utils/SimplifyCFG.cpp (working copy) @@ -298,7 +298,7 @@ const TargetTransformInfo &TTI) { assert(isSafeToSpeculativelyExecute(I) && "Instruction is not safe to speculatively execute!");

For foo it changes number of coverage callbacks from 3 to 4, and for bar -- from 1 to 3.

For Linux kernel it adds 6.8% of coverage callbacks (401279->428496). And I would expect most of that is very useful signal for fuzzing: complex, speculatable logical conditions (i.e. not if (p && p->x)).

which is why I was experimenting with clang coverage recently.

Have you considered:

  1. Disabling/enabling some transformations in llvm (like the change above)
  2. Running before SimplifyCFG ?

Both look like reasonable options (no need to write another instrumentation on AST level, still have access to powerful llvm analysis).

kcc commented 7 years ago

Is it possible to easily disable the merging pass if coverage is enabled?

If I modify lib/Transforms/Scalar/SimplifyCFGPass.cpp by returning early in iterativelySimplifyCFG I get no short-circuiting. I don't know off the top of my head if there is a flag to do it, or what consequences of this will be.

Is it possible to run the splitting pass before the coverage?

Unlikely.

This severe hurts coverage guiding (and almost makes everything we tell about coverage-guided fuzzing a lie...).

Only to some extent. In real code you usually don't get too much short-circuiting. But surely, I agree, this hurts some of the cases, which is why I was experimenting with clang coverage recently.

a8b4922b-9ceb-405e-9794-fa0271302cc2 commented 7 years ago

Is it possible to easily disable the merging pass if coverage is enabled? Is it possible to run the splitting pass before the coverage? This severe hurts coverage guiding (and almost makes everything we tell about coverage-guided fuzzing a lie...).

kcc commented 7 years ago

This is the IR that SanitizerCoverage gets: %cmp = icmp eq i32 %x, 57005 %cmp1 = icmp eq i32 %y, 48879 %or.cond = and i1 %cmp, %cmp1 br i1 %or.cond, label %if.then, label %if.end

So, there is no BB here, and we can't insert the BB callback (we do insert __sanitizer_cov_trace_const_cmp4, but that's not what you are asking about)

This happens very early in the optimization pipeline, at the first "Simplify the CFG". So, essentially the only place where we can insert the callbacks as you want is Clang -- and that's precisely what clang coverage will give you.

Later this BB gets unsplit back into two BBs (after "X86 DAG->DAG Instruction Selection") but it's too late.

If we insert callback at BB#1 we will be able to understand when we guessed x correctly.

Yes, you are right.

a8b4922b-9ceb-405e-9794-fa0271302cc2 commented 7 years ago

So, in "foo" we have BB#1 with two successors: LBB0_2 and BB#3 Both successors are instrumented, so also instrumenting BB#1 won't give us any new signal.

Nope. It clearly gives us new signal. If we insert callback at BB#1 we will be able to understand when we guessed x correctly.

kcc commented 7 years ago

Yep. The only option you have right now is to use -O0, where the compiler doesn't perform short-circuiting (I know this sucks).

You mean where is does perform short-circuiting?

short-circuiting is a transformation a&&b => a&b. At >= -O1 llvm does it sometimes at -O0 it never does.

But foo is short-circuited and there is even a basic block (BB#1) between evaluation of the conditions. Why don't we insert the callback there?

Ah, that's a different question.

So, in "foo" we have BB#1 with two successors: LBB0_2 and BB#3 Both successors are instrumented, so also instrumenting BB#1 won't give us any new signal.

Strange thing is that we don't instrument this BB even with -fsanitize-coverage=trace-pc,no-prune where pruning is disabled. Looking.

a8b4922b-9ceb-405e-9794-fa0271302cc2 commented 7 years ago

Yep. The only option you have right now is to use -O0, where the compiler doesn't perform short-circuiting (I know this sucks).

You mean where is does perform short-circuiting? But foo is short-circuited and there is even a basic block (BB#1) between evaluation of the conditions. Why don't we insert the callback there?

kcc commented 7 years ago

Yep. The only option you have right now is to use -O0, where the compiler doesn't perform short-circuiting (I know this sucks).

Another option for us is to move the coverage instrumentation to the point before short-circuiting happens.

There is already the Clang-based coverage instrumentation that works in clang, i.e. before any optimization: http://clang.llvm.org/docs/SourceBasedCodeCoverage.html

I've briefly tried using that one for libFuzzer (see projects/compiler-rt/lib/fuzzer/FuzzerClangCounters.cpp) and the results were worse than with SanitizerCoverage, but I've only tried one fuzzing benchmark.

Once we have a proper A/B testing (https://github.com/google/fuzzer-test-suite/tree/master/engine-comparison), we'll do a better experiment. (And the next experiment is to use both at the same time: clang coverage and sanitizer coverage).

a8b4922b-9ceb-405e-9794-fa0271302cc2 commented 7 years ago

assigned to @kcc