Open a8b4922b-9ceb-405e-9794-fa0271302cc2 opened 7 years ago
But my measurements were never "scientific" -- I am waiting for our A/B testing infra.
Will it be able to account for difference in performance between -O2 and -O0?
I hope so. There are two major metrics: time and # of iterations. When comparing O2 vs O0, time is less useful, but we still have # of iterations
If we get better coverage, but make it 10x slower, it's unsurprising that it will be worse in a limited-time run. Fixed-time run between -O2 and -O0 won't be apples to apples.
I would be surprised if short circuiting for code like if (p && p->x) happens frequently. (roughly speaking) it may only happen if the operands are side-effect-free.
That's exactly what I mean. These additional cases are not if (p && p->x), that's something more like if (x && y).
But my measurements were never "scientific" -- I am waiting for our A/B testing infra.
Will it be able to account for difference in performance between -O2 and -O0? If we get better coverage, but make it 10x slower, it's unsurprising that it will be worse in a limited-time run. Fixed-time run between -O2 and -O0 won't be apples to apples.
I've applied this change (which seems to be the minimal change that fixes the reproducer):
If you can make change with a proper flag (and if you think it's useful) we can enable such flag under -fsanitize-coverage=*
Index: lib/Transforms/Utils/SimplifyCFG.cpp
--- lib/Transforms/Utils/SimplifyCFG.cpp (revision 312156) +++ lib/Transforms/Utils/SimplifyCFG.cpp (working copy) @@ -298,7 +298,7 @@ const TargetTransformInfo &TTI) { assert(isSafeToSpeculativelyExecute(I) && "Instruction is not safe to speculatively execute!");
- return TTI.getUserCost(I);
return TargetTransformInfo::TCC_Expensive; }
/// If we have a merge point of an "if condition" as accepted above, @@ -5744,6 +5744,8 @@ }
bool SimplifyCFGOpt::SimplifyCondBranch(BranchInst *BI, IRBuilder<> &Builder) {
- return false;
BasicBlock *BB = BI->getParent();
For foo it changes number of coverage callbacks from 3 to 4, and for bar -- from 1 to 3.
For Linux kernel it adds 6.8% of coverage callbacks (401279->428496). And I would expect most of that is very useful signal for fuzzing: complex, speculatable logical conditions (i.e. not if (p && p->x)).
I would be surprised if short circuiting for code like if (p && p->x) happens frequently. (roughly speaking) it may only happen if the operands are side-effect-free.
which is why I was experimenting with clang coverage recently.
Have you considered:
- Disabling/enabling some transformations in llvm (like the change above)
I tried running with -O0 a few times to see if this helps (precisely for this reason). On small puzzles it definitely helps, but on larger benchmarks and on longer runs I did not see any benefit. But my measurements were never "scientific" -- I am waiting for our A/B testing infra.
- Running before SimplifyCFG
No tried.
?
Both look like reasonable options (no need to write another instrumentation on AST level, still have access to powerful llvm analysis).
I've applied this change (which seems to be the minimal change that fixes the reproducer):
--- lib/Transforms/Utils/SimplifyCFG.cpp (revision 312156) +++ lib/Transforms/Utils/SimplifyCFG.cpp (working copy) @@ -298,7 +298,7 @@ const TargetTransformInfo &TTI) { assert(isSafeToSpeculativelyExecute(I) && "Instruction is not safe to speculatively execute!");
return TargetTransformInfo::TCC_Expensive; }
/// If we have a merge point of an "if condition" as accepted above, @@ -5744,6 +5744,8 @@ }
bool SimplifyCFGOpt::SimplifyCondBranch(BranchInst *BI, IRBuilder<> &Builder) {
BasicBlock *BB = BI->getParent();
For foo it changes number of coverage callbacks from 3 to 4, and for bar -- from 1 to 3.
For Linux kernel it adds 6.8% of coverage callbacks (401279->428496). And I would expect most of that is very useful signal for fuzzing: complex, speculatable logical conditions (i.e. not if (p && p->x)).
which is why I was experimenting with clang coverage recently.
Have you considered:
Both look like reasonable options (no need to write another instrumentation on AST level, still have access to powerful llvm analysis).
Is it possible to easily disable the merging pass if coverage is enabled?
If I modify lib/Transforms/Scalar/SimplifyCFGPass.cpp by returning early in iterativelySimplifyCFG I get no short-circuiting. I don't know off the top of my head if there is a flag to do it, or what consequences of this will be.
Is it possible to run the splitting pass before the coverage?
Unlikely.
This severe hurts coverage guiding (and almost makes everything we tell about coverage-guided fuzzing a lie...).
Only to some extent. In real code you usually don't get too much short-circuiting. But surely, I agree, this hurts some of the cases, which is why I was experimenting with clang coverage recently.
Is it possible to easily disable the merging pass if coverage is enabled? Is it possible to run the splitting pass before the coverage? This severe hurts coverage guiding (and almost makes everything we tell about coverage-guided fuzzing a lie...).
This is the IR that SanitizerCoverage gets: %cmp = icmp eq i32 %x, 57005 %cmp1 = icmp eq i32 %y, 48879 %or.cond = and i1 %cmp, %cmp1 br i1 %or.cond, label %if.then, label %if.end
So, there is no BB here, and we can't insert the BB callback (we do insert __sanitizer_cov_trace_const_cmp4, but that's not what you are asking about)
This happens very early in the optimization pipeline, at the first "Simplify the CFG". So, essentially the only place where we can insert the callbacks as you want is Clang -- and that's precisely what clang coverage will give you.
Later this BB gets unsplit back into two BBs (after "X86 DAG->DAG Instruction Selection") but it's too late.
If we insert callback at BB#1 we will be able to understand when we guessed x correctly.
Yes, you are right.
So, in "foo" we have BB#1 with two successors: LBB0_2 and BB#3 Both successors are instrumented, so also instrumenting BB#1 won't give us any new signal.
Nope. It clearly gives us new signal. If we insert callback at BB#1 we will be able to understand when we guessed x correctly.
Yep. The only option you have right now is to use -O0, where the compiler doesn't perform short-circuiting (I know this sucks).
You mean where is does perform short-circuiting?
short-circuiting is a transformation a&&b => a&b. At >= -O1 llvm does it sometimes at -O0 it never does.
But foo is short-circuited and there is even a basic block (BB#1) between evaluation of the conditions. Why don't we insert the callback there?
Ah, that's a different question.
So, in "foo" we have BB#1 with two successors: LBB0_2 and BB#3 Both successors are instrumented, so also instrumenting BB#1 won't give us any new signal.
Strange thing is that we don't instrument this BB even with -fsanitize-coverage=trace-pc,no-prune where pruning is disabled. Looking.
Yep. The only option you have right now is to use -O0, where the compiler doesn't perform short-circuiting (I know this sucks).
You mean where is does perform short-circuiting? But foo is short-circuited and there is even a basic block (BB#1) between evaluation of the conditions. Why don't we insert the callback there?
Yep. The only option you have right now is to use -O0, where the compiler doesn't perform short-circuiting (I know this sucks).
Another option for us is to move the coverage instrumentation to the point before short-circuiting happens.
There is already the Clang-based coverage instrumentation that works in clang, i.e. before any optimization: http://clang.llvm.org/docs/SourceBasedCodeCoverage.html
I've briefly tried using that one for libFuzzer (see projects/compiler-rt/lib/fuzzer/FuzzerClangCounters.cpp) and the results were worse than with SanitizerCoverage, but I've only tried one fuzzing benchmark.
Once we have a proper A/B testing (https://github.com/google/fuzzer-test-suite/tree/master/engine-comparison), we'll do a better experiment. (And the next experiment is to use both at the same time: clang coverage and sanitizer coverage).
assigned to @kcc
Extended Description
clang version 5.0.0 (trunk 303084)
Program:
void bad(); void foo(int x, int y) { if (x == 0xdead && y == 0xbeef) bad(); } int bar(int x, int y) { return x == 0xdead && y == 0xbeef; }
$ clang test.c -O2 -c -S -o - -fsanitize-coverage=trace-pc
foo:
BB#1: # %entry
BB#3: # %if.then
.LBB0_2: # %if.end callq __sanitizer_cov_trace_pc
APP
.Lfunc_end0: .size foo, .Lfunc_end0-foo .cfi_endproc
bar:
For foo there must be an additional callback at BB#1, for bar there must be an additional callback somewhere.
This has negative effect on coverage-guided fuzzers as they need to guess N values simultaneously in order to make progress, rather then guess values one-by-one and persist progress.