Open jonathanmetzman opened 1 year ago
Yes! I was about to create an issue for the same reason before I saw this: ) There are two things in my mind: A major problem and a minor improvement.
The major problem is having too many benchmarks in CI tests to finish within the time limit (300 minutes).
It does not scale well because the code attempts to run all benchmarks in each category (oss-fuzz
, standard
, bug
) yet the time limit is fixed.
OSS-Fuzz's trial build is the best solution to this (very glad you proposed this).
Just in case we cannot do 1, (e.g. insufficient time to implement trial build, competition participants cannot test their fuzzers), a less optimal alternative is to limit the number of benchmarks in CI, e.g. only testing the most commonly supported 20 benchmarks. More specifically, all benchmarks in CI tests should be supported by the core fuzzers (and the new fuzzer added in the PR, if any):
for benchmark in ./benchmarks/*/
do
# Not mentioned in `unsupported_fuzzers`.
if ! git --no-pager grep -qEw 'afl|aflfast|aflplusplus|aflsmart|eclipser|fairfuzz|honggfuzz|libfuzzer|mopt|libafl|centipede' $benchmark/benchmark.yaml
then
echo `basename ${benchmark}`;
fi
done
The number 20 is purely empirically based on observations on current CI tests that timed out.
Currently, we test-run all fuzzers on all benchmarks even if only one fuzzer/benchmark changes, which wastes a lot of time and computation power. Instead, we could automatically detect the fuzzers and benchmarks changed in a PR and:
The current system doesn't scale well to having so many projects and makes it difficult to know what went wrong.