oss-fuzz is of dubious value for our use-cases

abadams commented 3 months ago

I wanted to know how much value we were getting from oss-fuzz. It's coverage guided, so it should be helpful in cases where coverage is hard to achieve. The simplifier has over a thousand rules, and we need to test corner cases for each rule, so the simplifier fuzz tester seemed like a good candidate.

I injected bugs into the simplifier, one at a time, and tested how long it took the oss-fuzz fuzzer to find the bug, vs the same test, but replacing all the oss-fuzz calls with calls to a random number generator. The bugs are intended to be representative of the kind of bugs we have found in the simplifier in the past. I focused more heavily on the bounds and alignment computation, because the rules themselves have mostly been formally verified (for infinite ints).

This is actually unfair in favor of oss-fuzz, because letting it run for a long time lets it exploit coverage information to stop testing the same pieces of code multiple times, whereas in practice we run it for a brief time and don't record state across runs. Here were my bugs:

1) Wildly incorrect rhs on very commonly used rule:

rewrite(c0 + c1, fold(c0 + c1 + 1))

oss-fuzz: 4.125s
     rng: 0.473s

2) incorrect predicate:

rewrite(max(x, y + c0) + c1, max(x + c1, y), c0 > c1) // Should be c0 == c1

oss-fuzz: 2m39s
     rng: 11m31s

3) Incorrect (too narrow) bounds on common op:

int64_t v4 = saturating_mul(a_bounds.max, b_bounds.min); // <- should be .max

oss-fuzz: 14m33s
     rng: 48m21s

4) Wrong alignment calculation in div visitor:

bounds->alignment = a_bounds.alignment + b_bounds.alignment;

oss-fuzz: 5m2s
     rng: 0m45s

5) Incorrect predicate on very-unlikely-to-trigger rule:

rewrite(min(x*c0, c1) - min(x, c2)*c0, min(c1 - min(x, c2)*c0, 0), c1 <= c2*c0) || // c0 > 0 is missing

oss-fuzz: 75m27s
     rng: 47m39s

6) Bad condition on double-widening-cast simplification: Allow removing inner cast even if it goes signed -> unsigned -> signed

oss-fuzz: 7m36s
     rng: 44m28s

7) Off-by-one in trim_bounds_using_alignment:

no_overflow &= sub_with_overflow(64, max, adjustment + 1, &new_max); // +1 is wrong

oss-fuzz: 1m40s
     rng: 24.3s

8) Slightly incorrect predicate in Simplify_And

rewrite(c0 <= x && x <= c1, false, c1 <= c0) || // Should be c1 < c0

oss-fuzz: 42m7s
     rng: 46m32s

9) Incorrect condition on bounds in LT visitor:
if (a_bounds.max_defined && b_bounds.min_defined && a_bounds.max <= b_bounds.min) { // Should be <
...
}

oss-fuzz: 12.901s
     rng: 1.088s

10) Bad alignment computation in select visitor:
    bounds->alignment = t_bounds.alignment; // Should union with f_bounds.alignment

oss-fuzz: 1m46s
     rng: 2.236s

Conclusion:

It's basically a wash. rng-based fuzzing found the bug faster 6 times, and oss-fuzz found the bug faster 4 times.

We should probably roll back oss-fuzz support. It complicates the build, and bugs found often fail to repro in different environments (See #8156, #7994, #7984, #7962). rng-based fuzzers are adequate.

steven-johnson commented 3 months ago

SGTM

TH3CHARLie commented 3 months ago

this is somewhat surprising, and I went to check the simplify.cpp implementation. My (very rough) hypothesis is that the implementation uses bitstream input directly from OSS-fuzz (via the FuzzedDataProvider), furthermore, the coverage-guided mutation can only mutate the input bitstream, so making very limited changes in terms of the expressions generated. Therefore sometimes random fuzzing can jump out of the box quicker. I wonder if using the fuzzer in a structure-aware way helps.

halide / Halide

oss-fuzz is of dubious value for our use-cases #8171