google / clusterfuzz

Scalable fuzzing infrastructure.
https://google.github.io/clusterfuzz
Apache License 2.0
5.27k stars 551 forks source link

Re-design / refactor strategy selection logic #1943

Open Dor1s opened 4 years ago

Dor1s commented 4 years ago

Historically we've been adding new strategies (since the moment we added the first one, which was value profiling I believe), but now it's a pretty big feature and looks complicated.

Each strategy has up to three important components: probability, incompatible strategies, other preconditions (e.g. platform or presence of an auxiliary build). With the existing implementation (a sequence of if / else statements with nested conditions):

1) actual probabilities are not predictable: e.g. https://github.com/google/clusterfuzz/pull/1942 makes entropic strategy to be used in approximately (1 - dataflow_probability) * entropic_probability portion of cases instead of just entropic_probability). If we factor in additional conditions for dataflow strategy to be used, the formula becomes even less predictable

2) the strategy exclusion isn't generic as each if statement has (may have) extra checks for other strategies; it's also not immediately obvious which strategies are not compatible

3) when we use MAB selected probabilities instead of the default one, their effect is also unclear due to 1)

Additional issue is that we don't verify that some strategies were actually used. For example, for dataflow strategy we should be checking that INFO: AUTOFOCUS: ... line is present in the logs. Ideally, I'd like to make libFuzzer (and other fuzzing engines) to explicitly output that a certain mode was enabled. Otherwise, given the variety of CF deployments, we might be having incorrect stats recorded. That lead to an incorrect input provided to the MAB selection algorithm, which makes point 3) even worse.

oliverchang commented 4 years ago

Do you have suggestions on how to do this? We can certainly design some way to specify exclusions in a much better way, but the predictability of probabilities will still be an issue.

Dor1s commented 4 years ago

Only half-baked thoughts. I was also hoping there is a known algorithm for choosing actions with certain probabilities and mutual exclusion, but I personally don't know any.

Perhaps we can have an API where we first initialize a strategy pool by adding there all supported strategies with their probabilities and incompatibility rules.

* ??? not sure what it should be of the top of my head. Maybe we should set explicit probability for mutual exclusive strategies, e.g. we can say "in 60% of runs we want either entropic or dataflow, with dataflow being used as twice as entropic". But that would mean we'll need to have some tree-looking config for the strategies.