Closed addisoncrump closed 3 months ago
Forgot to format...
It doesn't seem that the saving works as expected. I'm going to keep trying with this, but it's quite difficult to debug.
Okay, this should work now. I got confused as to the direction of the copy originally.
Thanks @addisoncrump! The code looks great to me. But before merging this, let's run an experiment on this PR to triple-check that this also works in the cloud instances : ) Could you please make a trivial modification to service/gcbrun_experiment.py? This will allow me to launch experiments in this PR for final validation. Here is an example to add a dummy comment. We can revert this after the experiment. Thanks!
let's run an experiment on this PR to triple-check that this also works in the cloud instances
Sure, and also to collect the corresponding coverage data for the "standard" fuzzers. I'll make that change shortly.
Also, a local experiment shows that we also get warning info in the JSON (!):
warning: 6 functions have mismatched data
{"data":[{"files":[{"branches":[[102,22,102,36,0,0,0,0,4],[103,9,103,41,0,0,0,0,4],...]}]}]}
Should we remove this?
Should we remove this?
Do you happen to know the cause of this?
To be honest, I've looked around a bit now and do not see the root cause.
It seems to be using new_process.execute
, but this redirects stdout only. I presume, then, that llvm-cov is actually producing warnings in stdout(!). I'll see if I can find the appropriate command line switch to remove this.
It seems to be a known issue btw; get_coverage_infomation
(typo: information) already handles this.
That seems to have done it. The get_coverage_infomation
function can remain as-is without loss of functionality.
Running a quick local test and then will stage the cloud test.
Okay, so I spent quite a while debugging a weird change that was occurring when presubmit was applied; namely, make presubmit
was modifying the file analysis/test_data/pairwise_unique_coverage_heatmap-failed-diff.png
. This was a result of the seaborn version being incompatible with the version of matplotlib. I fixed this by updating the dependency in requirements.txt
. Nonetheless, this still had metadata changes which caused the diff to be modified in disk. Since this is the result of a test, I added it to the gitignore.
This also implies to me that the test should be failing, but isn't. I think this is a minor difference in how seaborn now emits heatmaps (seems to be some offset change).
Also, experimenting with compression, because the coverage dumps are quite large and easily compressible.
llvm-cov export: Unknown command line argument '-no-warn'. Try: 'llvm-cov export --help'
Well, the version of llvm-cov used is too old. I'll revert this now.
Compression reduces 15MB => 1MB, so seems worth it. This is now in a stable state and ready for a test run!
Nice! Let's start with a simple one.
collect the corresponding coverage data for the "standard" fuzzers.
Then we collect these.
/gcbrun run_experiment.py -a --experiment-config /opt/fuzzbench/service/experiment-config.yaml --experiment-name 2024-08-09-dg-2020 --fuzzers libfuzzer --benchmarks libxml2_xml
Should I also include new code for analysis to use this? I can include a test that points at the data available in that bucket.
Experiment 2024-08-09-dg-2020
data and results will be available later at:
The experiment data.
The experiment report.
Should I also include new code for analysis to use this? I can include a test that points at the data available in that bucket.
Yep sure, go for it. Thanks!
collect the corresponding coverage data for the "standard" fuzzers.
To minimize our waiting time, I will start this experiment, given 2024-08-09-dg-2020
looks good so far.
I will only use the popular ones for now due to the measurement bottleneck: aflplusplus centipede honggfuzz libafl libfuzzer
/gcbrun run_experiment.py -a --experiment-config /opt/fuzzbench/service/experiment-config.yaml --experiment-name 2024-08-10-base --fuzzers aflplusplus centipede honggfuzz libafl libfuzzer
due to the measurement bottleneck
Out of curiosity, what is the measurement bottleneck? I did notice that, despite having lots of corpus archives available, the snapshots don't seem to have been measured yet.
Experiment 2024-08-10-base
data and results will be available later at:
The experiment data.
The experiment report.
The experiment report(experimental).
Out of curiosity, what is the measurement bottleneck? I did notice that, despite having lots of corpus archives available, the snapshots don't seem to have been measured yet.
Currently we measure coverage of all results in one VM, which becomes insanely slow when there are too many fuzzers (e.g., > 8) in one experiment. We are working on fixing this.
Ah, okay, I understand. With the other experiment still running, it is effectively entirely overloaded, then.
Wait, something is going wrong with the 2024-08-10-base
.
The data directory was generated as expected, but the report was not.
This is weird, it seems all errors are related to libafl
.
example 1
example 2
Let me test re-running the experiment without it.
@tokatoka Do you happen to know why libafl
failed to build with many benchmarks?
Sorry for the fuss (No pun intended :P).
/gcbrun run_experiment.py -a --experiment-config /opt/fuzzbench/service/experiment-config.yaml --experiment-name 2024-08-10-test --fuzzers aflplusplus centipede honggfuzz libfuzzer
The data directory was generated as expected, but the report was not.
If none of the measurements have happened yet, it won't have created a report, no?
i guess we need to update libafl
@addisoncrump
can you change the commit we are using for libafl?
and also use fuzzers/fuzzbench/fuzzbench
instead of fuzzers/fuzzbench
@DonggeLiu Any complaints if I make the libafl change in this PR as well?
@DonggeLiu Any complaints if I make the libafl change in this PR as well?
Ah we would really appreciate it if you could do it in a different PR, given it is a stand-alone change. Hope that won't cause too much trouble : )
Thanks!
Thanks for the info, @tokatoka.
can you change the commit we are using for libafl?
What is the preferred commit to use?
i'd say we can just use the latest
Wait, something is going wrong with the 2024-08-10-base.
@DonggeLiu, was the root cause ever discovered?
@DonggeLiu, was the root cause ever discovered?
I think this is the reason: https://github.com/google/fuzzbench/pull/2023.
There are other warnings/errors, but I reckon this is the reason.
Also seeing a lot of this, but I presume that's unrelated to your PR?
Traceback (most recent call last):
File "/work/src/experiment/measurer/coverage_utils.py", line 74, in generate_coverage_report
coverage_reporter.generate_coverage_summary_json()
File "/work/src/experiment/measurer/coverage_utils.py", line 141, in generate_coverage_summary_json
result = generate_json_summary(coverage_binary,
File "/work/src/experiment/measurer/coverage_utils.py", line 269, in generate_json_summary
with open(output_file, 'w', encoding='utf-8') as dst_file:
FileNotFoundError: [Errno 2] No such file or directory: '/work/measurement-folders/lcms_cms_transform_fuzzer-centipede/merged.json'
I don't think so -- the modifications which were applied were done by the formatter. I can just revert that whole file if needed.
I can just revert that whole file if needed.
No need, I've addressed this in https://github.com/google/fuzzbench/pull/2023. Later we can merge that into here.
Oh, thanks for doing this. I don't think that is caused by your modification, but since you have reverted, let's run an experiment for it.
/gcbrun run_experiment.py -a --experiment-config /opt/fuzzbench/service/experiment-config.yaml --experiment-name 2024-08-12-2020 --fuzzers aflplusplus centipede honggfuzz libfuzzer
:+1: I figure since I didn't make any meaningful changes to that file anyway, better to leave it untouched. If the experiment magically starts working, I have no idea what that means, but I'll be happy about it lol
Experiment 2024-08-12-2020
data and results will be available later at:
The experiment data.
The experiment report.
The experiment report(experimental).
Yeah, looks like it's not working. This run should probably be cancelled, if nothing but to save some CPU time.
Yep, I suspect this is due to a benchmark compatibility issue. Let me verify this.
Also, seeing a lot of instances in this experiment being preempted:
Superceded by #2028.
Currently, only corpora are saved in the archive and the summaries of coverage are provided at the end of the experiment. This change simply incorporates the saving of the coverage data snapshots next to the trial corpus snapshots.