Open Dor1s opened 5 years ago
@mbarbella-chromium @mukundv-chrome for the short term, should we exclude random max length strategy? What'll be the best option for your ongoing experiment(s)?
Another bad idea is to do an extra run with -runs=0
on the full corpus to get the real initial_edge_coverage
. We can do it only for the runs where random max length is used, but that will be quite expensive and unwise anyway :(
@mbarbella-chromium @mukundv-chrome for the short term, should we exclude random max length strategy? What'll be the best option for your ongoing experiment(s)?
Discussed offline. The experimental data should be good for now, so I'll upload a CL to record new_edges = 0
when random max length strategy is used.
@inferno-chromium suggested a nice solution on libFuzzer side (to print stats after reading the first corpus dir), I've uploaded a CL as https://reviews.llvm.org/D66020, but we probably need to chose some other keyword. INITED
is supposed to indicate that ALL inputs files are read, so in our case it's kinda PARTIALLY INITED
:)
One more relevant problem: we don't use -use_value_profile=1
during merge, that means we discard some features that could've been discovered during fuzzing. That explains why new_features
stat so often has non-zero values.
I'll fix it as well when making the other merge changes.
One more thing that came up during discussion with @jonathanmetzman today. With this new merge thing, we might want to do gsutil cp
and gsutil rm
after every fuzzing run, to essentially "prune" the corpus. That way, the recurring corpus pruning task would have much less of a load and would be more like a clean up / stats collection task, rather than a real heavy corpus pruning job.
FTR, the most recent Clang roll in Chromium should include my -merge=1
change, so that is not a blocker anymore, but there are still some other things that can delay switching to the new merge logic.
I've checked BigQuery stats for the past 15 days for both Chromium ClusterFuzz and OSS-Fuzz.
The change has definitely had an effect -- I can share stats with Googlers.
One thing that is noticeable and expected is that an average new_edges
value has dropped.
Some other movements are also evident, though hard to comment on, as previously stats were incomplete as we skipped recording them for certain strategies.
As per Abhishek's suggestion I've checked whether this new implementation affected the number of post-fuzzing merge timeouts. I've checked the logs for the past 7+ days and it looks like the overall number of timeouts is not affected, which is expected, as the overall complexity of performing a merge hasn't changed much.
# Chromium ClusterFuzz, number of "Merging new testcases timed out" errors a day.
2019-12-10 : 1423
2019-12-11 : 1246
2019-12-12 : 1435 # two step merge was deployed at the end of this day
2019-12-13 : 1497
2019-12-14 : 1330
2019-12-15 : 1363
2019-12-16 : 1446
2019-12-17 : 1128 # not a full day
Re-evaluating some strategies / metrics after the change being deployed for ~2 weeks (access to the link is restricted for non-Googlers):
1) DFT on OSS-Fuzz: https://docs.google.com/spreadsheets/d/14hAEuRsfofSiInUTR7KA48SbGY4QPuxHOzObjVPInQk/edit#gid=1395928105
on average, DFT strategy is still losing to the runs where it's not being used
2) ML RNN on ClusterFuzz: https://docs.google.com/spreadsheets/d/13te_2ZcRQhbm25B_h2Xg8-TFV0TBn4TSSBmbTNLy2_E/edit#gid=0
ML RNN looks pretty good, it seems to be be useful, as 66.7% of fuzz targets perform better with ML RNN, while only 32% of targets perform better without it (all on average)
3) MAB probabilities: https://datastudio.google.com/c/u/0/reporting/1crV7jDmoFh25j2ZMATEMa3X0HfsKqGOQ/page/QAox
The graphs seem to be broken, possibly due to https://github.com/google/clusterfuzz/issues/1304
I've ran some queries myself in https://docs.google.com/spreadsheets/d/1dNgK9xg9zkdds6ckFalNWniVI2FtVZylkfcFKR52kX0/edit#gid=0
It might be a little hard to read, but an awesome observation here is how fork
strategy is showing up among a half of the top strategies since Dec 16th. Until that, fork
strategy wasn't ever in top 10, just because we did not have good stats for it. ML RNN probability has also jumped for ~10-20%, but that might be a temporary fluctuation.
Once the Data Studio charts are fixed, we should be able to observe even more changes.
One more relevant problem: we don't use -use_value_profile=1 during merge, that means we discard some features that could've been discovered during fuzzing. That explains why new_features stat so often has non-zero values.
I'll fix it as well when making the other merge changes.
Has this been fixed?
One more relevant problem: we don't use -use_value_profile=1 during merge, that means we discard some features that could've been discovered during fuzzing. That explains why new_features stat so often has non-zero values. I'll fix it as well when making the other merge changes.
Has this been fixed?
I think so: https://github.com/google/clusterfuzz/pull/1915
1) If we fuzz with some
-max_len
value, theinitial_edge_coverage
stat will not represent the actual edge coverage of the full corpus, because libFuzzer will ignore the contents past the firstmax_len
bytes in every input.2) During the merge, we reasonably remove
-max_len
argument, in order to avoid truncating the corpus and losing the coverage we have.3) The resulting
edge_coverage
is extracted from the merge log and might be greater than theinitial_edge_coverage
even if we didn't generate any inputs, just because-max_len
caused the initial coverage to be incomplete.4) The resulting
new_edge
metric is calculated as the difference between these two (excluding the corpus subset strategy, as in that case we were able to realize the very same problem and fixed it some time ago):https://github.com/google/clusterfuzz/blob/649d43cbd729acfd8a6342ad078b23fd191b9869/src/python/bot/fuzzers/libFuzzer/stats.py#L291
I don't have a good solution in mind. We can skip
new_edge
metric for random max length strategy, but that would mean we lose a signal for the strategy.My best idea so far is to make libFuzzer's merge to give us all the numbers (i.e. initial coverage + new coverage), but that'll complicate libFuzzer's interface, so I don't really like this either.
UPD: we don't use value profiling during merge, that means we might be missing some features.
UPD 2: One more thing that came up during discussion with @jonathanmetzman today. With this new merge thing, we might want to do
gsutil cp
andgsutil rm
after every fuzzing run, to essentially "prune" the corpus. That way, the recurring corpus pruning task would have much less of a load and would be more like a clean up / stats collection task, rather than a real heavy corpus pruning job.Action Items
merge_edge_coverage
column), sha1 check for corpus filenames._is_multistep_merge_supported
using a less hacky approach (check help message).