A/B testing for fuzzing engines

kcc commented 7 years ago

Let's keep the work-in-progress design of A/B testing tool at docs/ab-testing.md and use this bug to track the progress.

dnoursi commented 7 years ago

It may be helpful to keep track of what will reside in $WORK in the dispatcher. As of now it will contain

FTS/
fengine-configs/
SEND-{this benchmark} for every runner VM (as well as BUILD-{this benchmark}, temporarily)
fuzzing-engines/ which contains the build of each fuzzing engine

I'm not sure if this information would be helpful in the readme, or just out of place and unnecessary for users

morehouse commented 7 years ago

This information is mostly irrelevant for the user, but would fit well in the design doc.

Also, updating the design doc with PRs here before implementing things may prove beneficial for hashing out ideas early and accelerating the review process.

dnoursi commented 7 years ago

That sounds useful to me; regarding docker functionality, the next PR which is close to complete, there is currently a premade image on gcontainer registry (gcr) with built clang and all dependencies (essentially from tutorial/install-deps.sh).

From there, this repo will container a Dockerfile which pulls from gcr, sets some environment variables/directories (as compatible with dispatcher.sh), and runs 1 script.

As I work on the runners, it may seem better for them to have their own Dockerfile; the gcr image should still work for both.

dnoursi commented 7 years ago

I've switched our project to fuzzer-test-suite, due to an issue with the project ID of fuzz-comparison

dnoursi commented 7 years ago

For reference, engine-comparison is currently hard-wired with certain specifications which our group will always use, but a general user will want to specify. The list of all such variables is:

$GCLOUD_ZONE
$SERVICE_ACCOUNT
$IMAGE_FAMILY
$GSUTIL_BUCKET

These all rest in common-harness.sh; IMAGE_FAMILY in particular is in the gcloud_create function

dnoursi commented 7 years ago

I've been working on report generation, and I'm approaching a solution which mostly involves CSV generation in Golang. I should say this task could potentially be done without Golang, and just much more javascript sophistication, but there is a lot of array computation to be done in arranging multiple raw datasets (raw being just two columns, time and a single data column) into a complete graph.

To begin with, raw CSVs are located in a nest of directories, in general benchmark-X/fengine-Y/trial-Z/data.csv. So, at each level, CSVs from the lower level will be aggregated into a single CSV, and then this CSV will be fit for display in a static HTML page which just reads a CSV from its own directory (and which could be identical across all reports).

Accordingly, the first step is putting CSVs for each trial-A through trial-Z into a single CSV for the directory benchmark/fengine. This is just an aggregation of smaller datasets to the larger one.

The next step is then putting CSVs for each fengine into a single CSV for the benchmark/ folder. The data taken for each fengine could be all of the data, or it could be just the maximum trial from within each fengine CSV, or similarly the median trial.

The CSVs in each benchmark/ directory would directly compare fengines, and this is the most important comparison. The CSVs for each benchmark/fengine could be used to view variance among trials and etc.

google / fuzzer-test-suite

A/B testing for fuzzing engines #22