Archive coverage data alongside corpus archives

addisoncrump commented 3 months ago

Currently, only corpora are saved in the archive and the summaries of coverage are provided at the end of the experiment. This change simply incorporates the saving of the coverage data snapshots next to the trial corpus snapshots.

addisoncrump commented 3 months ago

Forgot to format...

addisoncrump commented 3 months ago

It doesn't seem that the saving works as expected. I'm going to keep trying with this, but it's quite difficult to debug.

addisoncrump commented 3 months ago

Okay, this should work now. I got confused as to the direction of the copy originally.

DonggeLiu commented 3 months ago

Thanks @addisoncrump! The code looks great to me. But before merging this, let's run an experiment on this PR to triple-check that this also works in the cloud instances : ) Could you please make a trivial modification to service/gcbrun_experiment.py? This will allow me to launch experiments in this PR for final validation. Here is an example to add a dummy comment. We can revert this after the experiment. Thanks!

addisoncrump commented 3 months ago

let's run an experiment on this PR to triple-check that this also works in the cloud instances

Sure, and also to collect the corresponding coverage data for the "standard" fuzzers. I'll make that change shortly.

addisoncrump commented 3 months ago

Also, a local experiment shows that we also get warning info in the JSON (!):

warning: 6 functions have mismatched data
{"data":[{"files":[{"branches":[[102,22,102,36,0,0,0,0,4],[103,9,103,41,0,0,0,0,4],...]}]}]}

Should we remove this?

DonggeLiu commented 3 months ago

Should we remove this?

Do you happen to know the cause of this?

addisoncrump commented 3 months ago

To be honest, I've looked around a bit now and do not see the root cause.

It seems to be using new_process.execute, but this redirects stdout only. I presume, then, that llvm-cov is actually producing warnings in stdout(!). I'll see if I can find the appropriate command line switch to remove this.

addisoncrump commented 3 months ago

It seems to be a known issue btw; get_coverage_infomation (typo: information) already handles this.

addisoncrump commented 3 months ago

That seems to have done it. The get_coverage_infomation function can remain as-is without loss of functionality.

Running a quick local test and then will stage the cloud test.

addisoncrump commented 3 months ago

Okay, so I spent quite a while debugging a weird change that was occurring when presubmit was applied; namely, make presubmit was modifying the file analysis/test_data/pairwise_unique_coverage_heatmap-failed-diff.png. This was a result of the seaborn version being incompatible with the version of matplotlib. I fixed this by updating the dependency in requirements.txt. Nonetheless, this still had metadata changes which caused the diff to be modified in disk. Since this is the result of a test, I added it to the gitignore.

This also implies to me that the test should be failing, but isn't. I think this is a minor difference in how seaborn now emits heatmaps (seems to be some offset change).

addisoncrump commented 3 months ago

Also, experimenting with compression, because the coverage dumps are quite large and easily compressible.

addisoncrump commented 3 months ago

llvm-cov export: Unknown command line argument '-no-warn'. Try: 'llvm-cov export --help'

Well, the version of llvm-cov used is too old. I'll revert this now.

addisoncrump commented 3 months ago

Compression reduces 15MB => 1MB, so seems worth it. This is now in a stable state and ready for a test run!

DonggeLiu commented 3 months ago

Nice! Let's start with a simple one.

collect the corresponding coverage data for the "standard" fuzzers.

Then we collect these.

DonggeLiu commented 3 months ago

/gcbrun run_experiment.py -a --experiment-config /opt/fuzzbench/service/experiment-config.yaml --experiment-name 2024-08-09-dg-2020 --fuzzers libfuzzer --benchmarks libxml2_xml

addisoncrump commented 3 months ago

Seems to be working: https://console.cloud.google.com/storage/browser/_details/fuzzbench-data/2024-08-09-dg-2020/experiment-folders/libxml2_xml-libfuzzer/trial-3038106/coverage/coverage-archive-0000.json.gz;tab=live_object

addisoncrump commented 3 months ago

Should I also include new code for analysis to use this? I can include a test that points at the data available in that bucket.

DonggeLiu commented 3 months ago

Experiment 2024-08-09-dg-2020 data and results will be available later at: The experiment data. The experiment report.

DonggeLiu commented 3 months ago

Should I also include new code for analysis to use this? I can include a test that points at the data available in that bucket.

Yep sure, go for it. Thanks!

DonggeLiu commented 3 months ago

collect the corresponding coverage data for the "standard" fuzzers.

To minimize our waiting time, I will start this experiment, given 2024-08-09-dg-2020 looks good so far. I will only use the popular ones for now due to the measurement bottleneck: aflplusplus centipede honggfuzz libafl libfuzzer

DonggeLiu commented 3 months ago

/gcbrun run_experiment.py -a --experiment-config /opt/fuzzbench/service/experiment-config.yaml --experiment-name 2024-08-10-base --fuzzers aflplusplus centipede honggfuzz libafl libfuzzer

addisoncrump commented 3 months ago

due to the measurement bottleneck

Out of curiosity, what is the measurement bottleneck? I did notice that, despite having lots of corpus archives available, the snapshots don't seem to have been measured yet.

DonggeLiu commented 3 months ago

Experiment 2024-08-10-base data and results will be available later at: The experiment data. The experiment report. The experiment report(experimental).

DonggeLiu commented 3 months ago

Out of curiosity, what is the measurement bottleneck? I did notice that, despite having lots of corpus archives available, the snapshots don't seem to have been measured yet.

Currently we measure coverage of all results in one VM, which becomes insanely slow when there are too many fuzzers (e.g., > 8) in one experiment. We are working on fixing this.

addisoncrump commented 3 months ago

Ah, okay, I understand. With the other experiment still running, it is effectively entirely overloaded, then.

DonggeLiu commented 3 months ago

Wait, something is going wrong with the 2024-08-10-base. The data directory was generated as expected, but the report was not.

This is weird, it seems all errors are related to libafl. example 1 example 2

Let me test re-running the experiment without it.

@tokatoka Do you happen to know why libafl failed to build with many benchmarks? Sorry for the fuss (No pun intended :P).

DonggeLiu commented 3 months ago

/gcbrun run_experiment.py -a --experiment-config /opt/fuzzbench/service/experiment-config.yaml --experiment-name 2024-08-10-test --fuzzers aflplusplus centipede honggfuzz libfuzzer

addisoncrump commented 3 months ago

The data directory was generated as expected, but the report was not.

If none of the measurements have happened yet, it won't have created a report, no?

tokatoka commented 3 months ago

i guess we need to update libafl @addisoncrump can you change the commit we are using for libafl? and also use fuzzers/fuzzbench/fuzzbench instead of fuzzers/fuzzbench

addisoncrump commented 3 months ago

@DonggeLiu Any complaints if I make the libafl change in this PR as well?

DonggeLiu commented 3 months ago

@DonggeLiu Any complaints if I make the libafl change in this PR as well?

Ah we would really appreciate it if you could do it in a different PR, given it is a stand-alone change. Hope that won't cause too much trouble : )

Thanks!

DonggeLiu commented 3 months ago

Thanks for the info, @tokatoka.

can you change the commit we are using for libafl?

What is the preferred commit to use?

tokatoka commented 3 months ago

i'd say we can just use the latest

addisoncrump commented 3 months ago

Wait, something is going wrong with the 2024-08-10-base.

@DonggeLiu, was the root cause ever discovered?

DonggeLiu commented 3 months ago

@DonggeLiu, was the root cause ever discovered?

I think this is the reason: https://github.com/google/fuzzbench/pull/2023.

There are other warnings/errors, but I reckon this is the reason.

DonggeLiu commented 3 months ago

Also seeing a lot of this, but I presume that's unrelated to your PR?

Traceback (most recent call last):
  File "/work/src/experiment/measurer/coverage_utils.py", line 74, in generate_coverage_report
    coverage_reporter.generate_coverage_summary_json()
  File "/work/src/experiment/measurer/coverage_utils.py", line 141, in generate_coverage_summary_json
    result = generate_json_summary(coverage_binary,
  File "/work/src/experiment/measurer/coverage_utils.py", line 269, in generate_json_summary
    with open(output_file, 'w', encoding='utf-8') as dst_file:
FileNotFoundError: [Errno 2] No such file or directory: '/work/measurement-folders/lcms_cms_transform_fuzzer-centipede/merged.json'

addisoncrump commented 3 months ago

I don't think so -- the modifications which were applied were done by the formatter. I can just revert that whole file if needed.

DonggeLiu commented 3 months ago

I can just revert that whole file if needed.

No need, I've addressed this in https://github.com/google/fuzzbench/pull/2023. Later we can merge that into here.

DonggeLiu commented 3 months ago

Oh, thanks for doing this. I don't think that is caused by your modification, but since you have reverted, let's run an experiment for it.

DonggeLiu commented 3 months ago

/gcbrun run_experiment.py -a --experiment-config /opt/fuzzbench/service/experiment-config.yaml --experiment-name 2024-08-12-2020 --fuzzers aflplusplus centipede honggfuzz libfuzzer

addisoncrump commented 3 months ago

:+1: I figure since I didn't make any meaningful changes to that file anyway, better to leave it untouched. If the experiment magically starts working, I have no idea what that means, but I'll be happy about it lol

DonggeLiu commented 3 months ago

Experiment 2024-08-12-2020 data and results will be available later at: The experiment data. The experiment report. The experiment report(experimental).

addisoncrump commented 3 months ago

Yeah, looks like it's not working. This run should probably be cancelled, if nothing but to save some CPU time.

DonggeLiu commented 3 months ago

Yep, I suspect this is due to a benchmark compatibility issue. Let me verify this.

Also, seeing a lot of instances in this experiment being preempted:

addisoncrump commented 3 months ago

Superceded by #2028.

google / fuzzbench

Archive coverage data alongside corpus archives #2020