Add new measurer based on Mutation Analysis

phi-go commented 1 year ago

We recently published a paper describing an approach to use mutation analysis for fuzzer benchmarking. This pull request aims to support this approach (Phase I only) for FuzzBench.

There are a few mismatches between FuzzBench and our framework that would need some solutions, while I have some ideas, I would be interested in your input as well!

My understanding of measurers for FuzzBench is as follows: At the start of a run all docker images are built. Specifically, for the coverage measurer the generated.mk contains targets of the form build-coverage-{benchmark} which build the images containing executables used for coverage collection. These make commands are started in the builder.py, and after the image is built the coverage executables are extracted and stored on the host. The coverage executables are then used by the code in the measure_manager.py to measure unmeasured_snapshots, which I would expect to be newly created fuzzer queue entries. Coverage measurement happens on the host in a temporary directory as far as I can tell, with a timeout of 900 seconds for a snapshot.

Open points:

Our framework requires a single bitcode file of intermediate same as the object file that would be passed to the fuzzer's compiler. We usually use wllvm/gllvm to get this bitcode file. Though there are other methods, this has worked quite well if the build scripts respect setting CC/CXX. Hopefully, this also works for FuzzBench.
In our framework we use coverage to decide on which mutant binaries to compile (we use Supermutants but I would ignore this step for now). However, the continuous measurements while running FuzzBench would require either compiling all mutant binaries at the beginning or on demand, once we see a mutation is covered. The first option, I would like to avoid as this will cause compilation of large amounts of mutants that might not even be used. The second option would require keeping around a builder image for each benchmark as well as some synchronization to avoid compiling multiple times. Also long compilation times can delay measurements of snapshots. Doing measurements for the mutation score at the end would also be a possibility, though it would require keeping around all snapshots.
Measuring if a mutant is detected needs to be done individually, by the nature of mutation analysis, infinite loops, high memory usage and crashes are expected to happen. To not mess with the host system it might also be sensible to execute the mutant executables in a container.

google-cla[bot] commented 1 year ago

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

phi-go commented 11 months ago

Hey @Alan32Liu, the current commit should hopefully pass the CI checks, though, we are still encountering some errors for the integration tests with our setup but I believe this is just caused by the differences in the environment. We have merged the fixes for https://github.com/google/fuzzbench/pull/1937, thank you for your help there. The current commit is still a work in progress regarding report generation and currently only 10 mutant binaries are built to keep testing times low.

We added the more granular timestamps as well, see changes to the extract_corpus() and archive_corpus() functions.

If the CI checks pass could you start a gcbrun? I think the following command should be correct:

/gcbrun run_experiment.py -a --experiment-config /opt/fuzzbench/service/experiment-config.yaml --experiment-name 2023-12-23-mua-measurer --fuzzers aflplusplus aflplusplus_407 --mutation-analysis

We also encounter two errors that we are quite confused by, maybe you have an idea?

One is part of the integration tests: Here: experiment/test_runner.py::TestIntegrationRunner::test_integration_runner fails for the following assert: assert len(os.listdir(output_corpus_dir)) > 5 The output_corpus_dir instead contains a corpus and crashes dir, where the corpus dir does contain more than 5 files. The fuzzer-log.txt contains the following warnings, could these explain the issue?:

[run_fuzzer] Running command: /home/pgoerz/fuzzbench/experiment/test_data/test_runner/MultipleConstraintsOnSmallInputTest -print_final_stats=1 -close_fd_mask=3 -fork=1 -ignore_ooms=1 -ignore_timeouts=1 -ignore_crashes=1 -entropic=1 -keep_seed=1 -cross_over_uniform_dist=1 -entropi>
WARNING: unrecognized flag '-fork=1'; use -help=1 to list all flags
WARNING: unrecognized flag '-ignore_ooms=1'; use -help=1 to list all flags
WARNING: unrecognized flag '-ignore_timeouts=1'; use -help=1 to list all flags
WARNING: unrecognized flag '-ignore_crashes=1'; use -help=1 to list all flags
WARNING: unrecognized flag '-entropic=1'; use -help=1 to list all flags
WARNING: unrecognized flag '-keep_seed=1'; use -help=1 to list all flags
WARNING: unrecognized flag '-cross_over_uniform_dist=1'; use -help=1 to list all flags
WARNING: unrecognized flag '-entropic_scale_per_exec_time=1'; use -help=1 to list all flags
INFO: Seed: 3421389961 INFO: Loaded 1 modules   (1645 inline 8-bit counters): 1645 [0x48ef70, 0x48f5dd),
INFO: Loaded 1 PC tables (1645 PCs): 1645 [0x472f58,0x479628),
INFO:        0 files found in /tmp/pytest-of-pgoerz/pytest-51/test_integration_runner0/corpus/corpus
INFO:        0 files found in /tmp/pytest-of-pgoerz/pytest-51/test_integration_runner0/seeds
INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 4096 bytes
INFO: A corpus is not provided, starting from an empty corpus #2
      INITED cov: 4 ft: 5 corp: 1/1b lim: 4 exec/s: 0 rss: 23Mb
        NEW_FUNC[1/1]: ==458257==WARNING: invalid path to external symbolizer! ==458257==
WARNING: Failed to use and restart external symbolizer! 0x450a10  (/home/pgoerz/fuzzbench/experiment/test_data/test_runner/MultipleConstraintsOnSmallInputTest+0x450a10) #2062
   NEW    cov: 7 ft: 8 corp: 2/21b lim: 21 exec/s: 0 rss: 26Mb L: 20/20 MS: 5 InsertRepeatedBytes-CMP-InsertByte-InsertRepeatedBytes-InsertByte- DE: "\x01\x00\x00\x00\x00\x00\x00\x14"-
        NEW_FUNC[1/4]: 0x450aa0  (/home/pgoerz/fuzzbench/experiment/test_data/test_runner/MultipleConstraintsOnSmallInputTest+0x450aa0)
        NEW_FUNC[2/4]: 0x450b30  (/home/pgoerz/fuzzbench/experiment/test_data/test_runner/MultipleConstraintsOnSmallInputTest+0x450b30) #2240
   NEW    cov: 17 ft: 18 corp: 3/41b lim: 21 exec/s: 0 rss: 26Mb L: 20/20 MS: 3 CopyPart-CMP-CopyPart- DE: "\x01\x00"-

The other happens during report generation, which we tried to debug but are not even sure what of our changes could have even caused it:

INFO:root:experiment_df:
                                   git_hash  experiment_filestore  ... fuzzer_stats crash_key
0  243d1022080f665c6755bc0aff12f8b4d43d2098  /tmp/experiment-data  ...         None      None
1  243d1022080f665c6755bc0aff12f8b4d43d2098  /tmp/experiment-data  ...         None      None
2  243d1022080f665c6755bc0aff12f8b4d43d2098  /tmp/experiment-data  ...         None      None

[3 rows x 12 columns]
WARNING:root:Filtered out invalid benchmarks: set().
                                   git_hash  experiment_filestore  ... fuzzer_stats crash_key
0  243d1022080f665c6755bc0aff12f8b4d43d2098  /tmp/experiment-data  ...         None      None
1  243d1022080f665c6755bc0aff12f8b4d43d2098  /tmp/experiment-data  ...         None      None
2  243d1022080f665c6755bc0aff12f8b4d43d2098  /tmp/experiment-data  ...         None      None

[3 rows x 12 columns]
Int64Index([0, 1, 2], dtype='int64')
                                   git_hash  experiment_filestore  ... crash_key firsts
0  243d1022080f665c6755bc0aff12f8b4d43d2098  /tmp/experiment-data  ...      None   True
1  243d1022080f665c6755bc0aff12f8b4d43d2098  /tmp/experiment-data  ...      None   True
2  243d1022080f665c6755bc0aff12f8b4d43d2098  /tmp/experiment-data  ...      None   True

[3 rows x 13 columns]
Int64Index([0, 1, 2], dtype='int64')
ERROR:root:Error generating HTML report. Extras:
    traceback: Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/pandas/core/frame.py", line 11003, in _reindex_for_setitem
    reindexed_value = value.reindex(index)._values
  File "/usr/local/lib/python3.10/site-packages/pandas/util/_decorators.py", line 324, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/pandas/core/frame.py", line 4807, in reindex
    return super().reindex(**kwargs)
  File "/usr/local/lib/python3.10/site-packages/pandas/core/generic.py", line 4966, in reindex
    return self._reindex_axes(
  File "/usr/local/lib/python3.10/site-packages/pandas/core/frame.py", line 4626, in _reindex_axes
    frame = frame._reindex_index(
  File "/usr/local/lib/python3.10/site-packages/pandas/core/frame.py", line 4642, in _reindex_index
    new_index, indexer = self.index.reindex(
  File "/usr/local/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 4237, in reindex
    target = self._wrap_reindex_result(target, indexer, preserve_names)
  File "/usr/local/lib/python3.10/site-packages/pandas/core/indexes/multi.py", line 2520, in _wrap_reindex_result
    target = MultiIndex.from_tuples(target)
  File "/usr/local/lib/python3.10/site-packages/pandas/core/indexes/multi.py", line 204, in new_meth
    return meth(self_or_cls, *args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/pandas/core/indexes/multi.py", line 559, in from_tuples
    arrays = list(lib.tuples_to_object_array(tuples).T)
  File "pandas/_libs/lib.pyx", line 2930, in pandas._libs.lib.tuples_to_object_array
ValueError: Buffer dtype mismatch, expected 'Python object' but got 'long'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/work/src/experiment/reporter.py", line 82, in output_report
    generate_report.generate_report(
  File "/work/src/analysis/generate_report.py", line 269, in generate_report
    experiment_df = data_utils.add_bugs_covered_column(experiment_df)
  File "/work/src/analysis/data_utils.py", line 170, in add_bugs_covered_column
    df['firsts'] = (
  File "/usr/local/lib/python3.10/site-packages/pandas/core/frame.py", line 3645, in __setitem__
    self._set_item_frame_value(key, value)
  File "/usr/local/lib/python3.10/site-packages/pandas/core/frame.py", line 3787, in _set_item_frame_value
    arraylike = _reindex_for_setitem(value, self.index)
  File "/usr/local/lib/python3.10/site-packages/pandas/core/frame.py", line 11010, in _reindex_for_setitem
    raise TypeError(
TypeError: incompatible index of inserted column with frame index

DonggeLiu commented 11 months ago

/gcbrun run_experiment.py -a --experiment-config /opt/fuzzbench/service/experiment-config.yaml --experiment-name 2023-12-23-mua-measurer --fuzzers aflplusplus aflplusplus_407 --mutation-analysis

DonggeLiu commented 11 months ago

We also encounter two errors that we are quite confused by, maybe you have an idea?

I will list my initial thoughts below, but I did not have a chance to read your code thoroughly and may make mistakes. @jonathanmetzman please correct me if I am wrong.

One is part of the integration tests: Here: experiment/test_runner.py::TestIntegrationRunner::test_integration_runner fails for the following assert: assert len(os.listdir(output_corpus_dir)) > 5 The output_corpus_dir instead contains a corpus and crashes dir, where the corpus dir does contain more than 5 files. The fuzzer-log.txt contains the following warnings, could these explain the issue?:

It seems the corpus/ was not correctly defined by your fuzzer (BTW, would it be better to change the copyright info?). Here is an example from libfuzzer: https://github.com/google/fuzzbench/blob/2bc06d4aca2c8dae20eeca47b95433ef543f99a8/fuzzers/libfuzzer/fuzzer.py#L49-L52

The other happens during report generation, which we tried to debug but are not even sure what of our changes could have even caused it:

Traceback (most recent call last): File "/work/src/experiment/reporter.py", line 82, in output_report generate_report.generate_report( File "/work/src/analysis/generate_report.py", line 269, in generate_report experiment_df = data_utils.add_bugs_covered_column(experiment_df) File "/work/src/analysis/data_utils.py", line 170, in add_bugs_covered_column df['firsts'] = ( File "/usr/local/lib/python3.10/site-packages/pandas/core/frame.py", line 3645, in setitem self._set_item_frame_value(key, value) File "/usr/local/lib/python3.10/site-packages/pandas/core/frame.py", line 3787, in _set_item_frame_value arraylike = _reindex_for_setitem(value, self.index) File "/usr/local/lib/python3.10/site-packages/pandas/core/frame.py", line 11010, in _reindex_for_setitem raise TypeError( TypeError: incompatible index of inserted column with frame index

The corresponding lines were finding the first bug covered by each fuzzer on each benchmark, and adding it as a new column in the original data frame: https://github.com/google/fuzzbench/blob/2bc06d4aca2c8dae20eeca47b95433ef543f99a8/analysis/data_utils.py#L162-L164

The error complains that the size of the new column does not align with the dataframe. I don't remember the exact detail of these lines, but to further debug this, a simple way is to separate this complex statement into smaller ones, and print out the result matrix of each step.

E.g., is this a column mismatch in the result you pasted above?

INFO:root:experiment_df:
...
[3 rows x 12 columns]
WARNING:root:Filtered out invalid benchmarks: set().
...
[3 rows x 12 columns]
Int64Index([0, 1, 2], dtype='int64')
...
[3 rows x 13 columns]

Also, I would fix the 1st error and re-run the exp to check if the 2nd one still exists, just in case it is caused by the 1st.

DonggeLiu commented 11 months ago

/gcbrun run_experiment.py -a --experiment-config /opt/fuzzbench/service/experiment-config.yaml --experiment-name 2023-12-23-mua-measurer --fuzzers aflplusplus aflplusplus_407 --mutation-analysis

This failed because there is no fuzzer named aflplusplus_407. May be it was not included in this PR?

phi-go commented 11 months ago

So presubmit error seems to be this:

"gsutil cp /tmp/tmplzfk1uuo gs://experiment-data/test-experiment/build-logs/benchmark-freetype2_ftfuzzer-mutation_analysis.txt" returned: 1. Extras: 
    output: ServiceException: 401 Anonymous caller does not have storage.objects.list access to the Google Cloud Storage bucket. Permission 'storage.objects.list' denied on resource (or it may not exist).

gs://experiment-data should be the same bucket as in main, so I'm not sure what caused this ...

Regarding openh264, I have merged the main branch but I see now that the fix is not in there, let me add the fix from the pull request.

This failed because there is no fuzzer named aflplusplus_407. May be it was not included in this PR?

I just copied the command assuming those fuzzers existed in this branch, it doesn't really matter which fuzzer the experiment is run with, we can also just do one.

Thank you for the comments on the errors, we won't have time this weekend but will look into this starting Monday.

DonggeLiu commented 11 months ago

/gcbrun run_experiment.py -a --experiment-config /opt/fuzzbench/service/experiment-config.yaml --experiment-name 2023-12-23-mua-measurer --fuzzers aflplusplus --mutation-analysis

DonggeLiu commented 11 months ago

So presubmit error seems to be this:

"gsutil cp /tmp/tmplzfk1uuo gs://experiment-data/test-experiment/build-logs/benchmark-freetype2_ftfuzzer-mutation_analysis.txt" returned: 1. Extras: 
    output: ServiceException: 401 Anonymous caller does not have storage.objects.list access to the Google Cloud Storage bucket. Permission 'storage.objects.list' denied on resource (or it may not exist).

gs://experiment-data should be the same bucket as in main, so I'm not sure what caused this ...

This is because your statement requires gsutil actions, but CI does not authenticate hence cannot perform that action.

Could that statement be replaced by mocks?

DonggeLiu commented 11 months ago

Experiment 2023-12-23-mua-measurer data and results will be available later at: The experiment data. The experiment report.

phi-go commented 11 months ago

This is because your statement requires gsutil actions, but CI does not authenticate hence cannot perform that action.

Could that statement be replaced by mocks?

How does this test work for the coverage measurer? As far as I can see there is no fixture setting up the data for the coverage measurer. We just added the mua commands (that require gsutil) there so that this test prepares the environment for the mua measurer correctly but it seems there is no similar setup needed for the coverage measurer. We can of course also try to use a mocking instead.

DonggeLiu commented 11 months ago

How does this test work for the coverage measurer?

Don't know, I would have to read the code to learn : )

phi-go commented 11 months ago

Oh, I tried but I'm missing some context that is not apparent to me, so I thought I would ask.

phi-go commented 11 months ago

I would have two more questions:

(BTW, would it be better to change the copyright info?)

Could you expand on the copyright comment, this should be the same that is used throughout the project. Or do you mean that we should put ourselves as the copyright holders?
Is there a way to see the logs for the gcbrun? We would need to check if the report/dataframe issue is also happening there. The experiment report also seems to be empty, so this might be the cause. (https://github.com/google/fuzzbench/pull/1901#issuecomment-1868274394)

Could you also please start another run:

/gcbrun run_experiment.py -a --experiment-config /opt/fuzzbench/service/experiment-config.yaml --experiment-name 2023-12-25-mua-measurer --fuzzers aflplusplus --mutation-analysis

And a update on the changes:

It seems the corpus/ was not correctly defined by your fuzzer (BTW, would it be better to change the copyright info?). Here is an example from libfuzzer:

This test seems to pass the CI so it must just be a local environment problem. (We copied the coverage "fuzzer" as we are adding a measurer I would expect there should be no need to set a corpus dir, it would not be used anyway.)

This is because your statement requires gsutil actions, but CI does not authenticate hence cannot perform that action.

Could that statement be replaced by mocks?

I have now changed the test to only test the coverage part. Mocking seems a bit pointless, if we want to do more than just check if our two functions are called then I expect mocking to be quite brittle.

The fix for openh264 has now been cherry-picked into this branch.

The results of the mua evaluation are now also written to the experiment filestore.

DonggeLiu commented 10 months ago

do you mean that we should put ourselves as the copyright holders?

Yep, 1) because you wrote them, 2) It's 2023, not 2020 : )

Is there a way to see the logs for the gcbrun?

There are two kinds of logs:

The logs written by your 'fuzzer'.
The logs written by gcloud.

For 1, it's normally in the experiment data link (e.g., an example from a different exp). If you could write to this log, it will be the most convenient solution. For this experiment, they are missing because of fuzzer build failures.

For 2, we don't have a good way to publish it yet, because gcloud logs contain the information of private experiments. If you are looking for something, I can search, copy, and paste it for you (not very scalable, unfortunately).

Could you also please start another run:

/gcbrun run_experiment.py -a --experiment-config /opt/fuzzbench/service/experiment-config.yaml --experiment-name 2023-12-25-mua-measurer --fuzzers aflplusplus --mutation-analysis

Sure, will do below.

The results of the mua evaluation are now also written to the experiment filestore.

Ah, nice. Thanks!

DonggeLiu commented 10 months ago

/gcbrun run_experiment.py -a --experiment-config /opt/fuzzbench/service/experiment-config.yaml --experiment-name 2023-12-25-mua-measurer --fuzzers aflplusplus --mutation-analysis

DonggeLiu commented 10 months ago

presubmit failed (run make format).

--- /home/runner/work/fuzzbench/fuzzbench/experiment/measurer/run_mua.py    (original)
+++ /home/runner/work/fuzzbench/fuzzbench/experiment/measurer/run_mua.py    (reformatted)
@@ -20,7 +20,6 @@
 from common import experiment_utils
 from common import new_process
 from experiment.build import build_utils
-

 logger = logs.Logger()

@@ -86,5 +85,5 @@
     logger.info(f'mua_build_ids command: {docker_exec_command}')
     mua_build_res = new_process.execute(docker_exec_command)
     logger.info(f'mua_build_ids result: {mua_build_res}')
-    build_utils.store_mua_build_log(mua_build_res.output, benchmark,
-                                    fuzzer, cycle)
+    build_utils.store_mua_build_log(mua_build_res.output, benchmark, fuzzer,
+                                    cycle)

DonggeLiu commented 10 months ago

Experiment 2023-12-25-mua-measurer data and results will be available later at: The experiment data. The experiment report.

phi-go commented 10 months ago

do you mean that we should put ourselves as the copyright holders?

Yep, 1) because you wrote them, 2) It's 2023, not 2020 : )

Will update!

Is there a way to see the logs for the gcbrun?

There are two kinds of logs:
1. The logs written by your _'fuzzer'_.

2. The logs written by gcloud.

For the dataframe/report issue the log should be written by gcloud, I added handling to collect the error and write it to experiment-data as well to make it easy.

Could you also please start another run:

/gcbrun run_experiment.py -a --experiment-config /opt/fuzzbench/service/experiment-config.yaml --experiment-name 2023-12-25-mua-measurer --fuzzers aflplusplus --mutation-analysis

Sure, will do below.

Seems there was an issue:

Step #1: Already have image: gcr.io/fuzzbench/experiment-runner Step #1: INFO:root:Command: ['-a', '--experiment-config', '/opt/fuzzbench/service/experiment-config.yaml', '--experiment-name', '2023-12-25-mua-measurer', '--fuzzers', 'aflplusplus', '--mutation-analysis']. Step #1: fatal: not a git repository (or any of the parent directories): .git Step #1: INFO:root:Failed to create instance. Command: ['gcloud', 'compute', 'instances', 'create', 'd-2023-12-25-mua-measurer', '--image-family=cos-stable', '--image-project=cos-cloud', '--zone=us-central1-c', '--scopes=cloud-platform', '--machine-type=n1-highmem-96', '--boot-disk-size=4TB', '--boot-disk-type=pd-ssd', '--metadata-from-file', 'startup-script=/workspace/tmpsis6no23'] failed. Output: ERROR: (gcloud.compute.instances.create) Could not fetch resource: Step #1: - The resource 'projects/fuzzbench/zones/us-central1-c/instances/d-2023-12-25-mua-measurer' already exists

I updated the experiment name, so I'm sorry to say, but I'm again not quite sure what this error message means.

Also sorry about the formatting issue, I though I had checked it. Should be fine now.

DonggeLiu commented 10 months ago

Seems there was an issue: I updated the experiment name, so I'm sorry to say, but I'm again not quite sure what this error message means.

Short summary:

The experiment started successfully, and then failed due to various build errors.
I will try a new experiment with libFuzzer.

Long explanation:

My previous comment that quoted your command actually launched the experiment, so my second comment that only has the command failed because an experiment with the same name exists.

To debug the build error, we need to read the build logs. For example, I noticed that a common build error is

Step #7 - "aflplusplus-bloaty_fuzz_target-intermediate-runner": executor failed running [/bin/sh -c apt install -y unzip git gdb joe]: exit code: 100
Finished Step #7 - "aflplusplus-bloaty_fuzz_target-intermediate-runner"
ERROR
ERROR: build step 7 "gcr.io/cloud-builders/docker" failed: step exited with non-zero status: 1

However, this seems to be a problem with afl++ instead of your code, because benchmark-curl_curl_fuzzer_http-mutation_analysis builds fine.

I will use libfuzzer to double-check.

DonggeLiu commented 10 months ago

/gcbrun run_experiment.py -a --experiment-config /opt/fuzzbench/service/experiment-config.yaml --experiment-name 2023-12-27-mua-measurer --fuzzers libfuzzer --mutation-analysis

DonggeLiu commented 10 months ago

Experiment 2023-12-27-mua-measurer data and results will be available later at: The experiment data. The experiment report.

phi-go commented 10 months ago

Thank you for looking into this. Some benchmarks still fail to compile for mutation analysis, we'll work on fixing those issues. We know that for example libxml2_xml works, is it possible to have a test run with only one benchmark?

phi-go commented 10 months ago

I guess it would be this command:

/gcbrun run_experiment.py -a --experiment-config /opt/fuzzbench/service/experiment-config.yaml --experiment-name 2023-12-28-mua-measurer-only-libxml2_xml --fuzzers libfuzzer --benchmarks libxml2_xml --mutation-analysis

Also would it be possible to have a run without mutation analysis? If that one succeeds we can have the participants rebase their PRs.

/gcbrun run_experiment.py -a --experiment-config /opt/fuzzbench/service/experiment-config.yaml --experiment-name 2023-12-28-no-mua-measurer --fuzzers libfuzzer

DonggeLiu commented 10 months ago

/gcbrun run_experiment.py -a --experiment-config /opt/fuzzbench/service/experiment-config.yaml --experiment-name 2023-12-28-mua-libxml2-xml --fuzzers libfuzzer --benchmarks libxml2_xml --mutation-analysis

DonggeLiu commented 10 months ago

/gcbrun run_experiment.py -a --experiment-config /opt/fuzzbench/service/experiment-config.yaml --experiment-name 2023-12-28-no-mua-measurer --fuzzers libfuzzer

DonggeLiu commented 10 months ago

Experiment 2023-12-28-mua-libxml2-xml data and results will be available later at: The experiment data. The experiment report. Experiment 2023-12-28-no-mua-measurer data and results will be available later at: The experiment data. The experiment report.

I modified the experiment name a bit because exp names must be within 30 chars and can only be `^[a-z0-9-]'.

phi-go commented 10 months ago

Thank you! Ah, yes I forgot there was the 30 chars limit, thanks for fixing it.

phi-go commented 10 months ago

Could it be that the 2023-12-28-mua-libxml2_xml experiment was not correctly started, it seems there is no data at all?

For the no mua version we still need to fix that we do not build the mua containers if the flag is not given.

DonggeLiu commented 10 months ago

/gcbrun run_experiment.py -a --experiment-config /opt/fuzzbench/service/experiment-config.yaml --experiment-name 2023-12-28-mua-libxml2-xml --fuzzers libfuzzer --benchmarks libxml2_xml --mutation-analysis

Hmm strange. I will launch another one now and check for reasons later.

DonggeLiu commented 10 months ago

Could it be that the 2023-12-28-mua-libxml2_xml experiment was not correctly started, it seems there is no data at all?

Ah, it's actually the naming problem. Fixed above.

I noticed many errors printed by this line.

And the docker container was not found:

{
  "insertId": "19sqza2g152wo3y",
  "jsonPayload": {
    "experiment": "2023-12-28-mua-libxml2-xml",
    "instance_name": "d-2023-12-28-mua-libxml2-xml",
    "subcomponent": "measurer",
    "component": "dispatcher",
    "message": "Executed command: \"docker start mutation_analysis_libxml2_xml_container\" returned: 1.",
    "output": "Error response from daemon: No such container: mutation_analysis_libxml2_xml_container\nError: failed to start containers: mutation_analysis_libxml2_xml_container\n"
  },
  "resource": {
    "type": "gce_instance",
    "labels": {
      "instance_id": "8347721258806393535",
      "project_id": "fuzzbench",
      "zone": "projects/1097086166031/zones/us-central1-c"
    }
  },
  "timestamp": "2023-12-27T23:24:25.149369689Z",
  "severity": "ERROR",
  "logName": "projects/fuzzbench/logs/fuzzbench",
  "receiveTimestamp": "2023-12-27T23:24:25.149369689Z"
}

DonggeLiu commented 10 months ago

For the no mua version we still need to fix that we do not build the mua containers if the flag is not given.

Ah I see.

phi-go commented 10 months ago

Alright, we finally got things working using our cloud account. I think we got through most problems, currently running into some memory limitations with the VMs we chose. So, I think we can try both experiments again.

/gcbrun run_experiment.py -a --experiment-config /opt/fuzzbench/service/experiment-config.yaml --experiment-name 2023-12-28-mua-only-libxml2-2 --fuzzers libfuzzer --benchmarks libxml2_xml --mutation-analysis

Also would it be possible to have a run without mutation analysis? If that one succeeds we can have the participants rebase their PRs.

/gcbrun run_experiment.py -a --experiment-config /opt/fuzzbench/service/experiment-config.yaml --experiment-name 2023-12-28-no-mua-measurer-2 --fuzzers libfuzzer

phi-go commented 10 months ago

Also the startup dispatcher template does not escape the postgres password which is quite confusing when having a password with special characters. I fixed this in a commit, may be interesting for you as well.

DonggeLiu commented 10 months ago

/gcbrun run_experiment.py -a --experiment-config /opt/fuzzbench/service/experiment-config.yaml --experiment-name 2023-12-29-mua-only-libxml2 --fuzzers libfuzzer --benchmarks libxml2_xml --mutation-analysis

DonggeLiu commented 10 months ago

/gcbrun run_experiment.py -a --experiment-config /opt/fuzzbench/service/experiment-config.yaml --experiment-name 2023-12-29-no-mua-measurer --fuzzers libfuzzer

DonggeLiu commented 10 months ago

Experiment 2023-12-29-mua-only-libxml2 data and results will be available later at: The experiment data. The experiment report.

Experiment 2023-12-29-no-mua-measurer data and results will be available later at: The experiment data. The experiment report.

phi-go commented 10 months ago

Thank you for starting the runs again. The libxml2 run did not work for mua because of permission errors in the mapped directory, this is now fixed. Can't tell from the experiment data what went wrong with the no-mua-measurer, I'll start another run on my side.

One big remaining question: I tried to test if the dispatcher vm is preemptible by stopping and restarting the vm. From what I read this should simulate the preemption process. However, the dispatcher does not correctly resume with the error that the experiment already exists in the database. Did I test it incorrectly or does the dispatcher vm not need to be preemptible?

Also, I think that the reporting does not produce a report for single fuzzer runs, as the pandas dataframe bug (second bug mentioned here) seems to be caused by evaluating only one fuzzer.

Current status is that generally things should now work with mutation testing at least they do with our cloud config. However, only 8/23 default benchmarks work with mutation testing, this will require some more work. Also the mua results are not yet integrated into the report. If we can confirm that things now work we can look towards doing some real evaluations.

Could you start another run, I'm fairly confident that this one should finally work.

/gcbrun run_experiment.py -a --experiment-config /opt/fuzzbench/service/experiment-config.yaml --experiment-name 2023-12-29-mua-libxml2-two-f --fuzzers afl libfuzzer --benchmarks libxml2_xml --mutation-analysis

DonggeLiu commented 10 months ago

/gcbrun run_experiment.py -a --experiment-config /opt/fuzzbench/service/experiment-config.yaml --experiment-name 2023-12-30-mua-libxml2-two-f --fuzzers afl libfuzzer --benchmarks libxml2_xml --mutation-analysis

DonggeLiu commented 10 months ago

Experiment 2023-12-30-mua-libxml2-two-f data and results will be available later at: The experiment data. The experiment report.

DonggeLiu commented 10 months ago

One big remaining question: I tried to test if the dispatcher vm is preemptible by stopping and restarting the vm. From what I read this should simulate the preemption process. However, the dispatcher does not correctly resume with the error that the experiment already exists in the database. Did I test it incorrectly or does the dispatcher vm not need to be preemptible?

The dispatcher is non-preemptible by default:

It is created with create_instance(), which sets preemptible is bool = False: https://github.com/google/fuzzbench/blob/2bc06d4aca2c8dae20eeca47b95433ef543f99a8/common/gcloud.py#L47-L52
The variable preemptible was not specified when creating the dispatcher. (If you'd like to change that, here is the spot): https://github.com/google/fuzzbench/blob/2bc06d4aca2c8dae20eeca47b95433ef543f99a8/experiment/run_experiment.py#L558-L561

phi-go commented 10 months ago

I see, so as preemptible is not provided in the call, I guess you are running the the dispatcher as non-preemptible? For now I would be fine with supporting whatever you are currently doing, if that is good for you.

phi-go commented 10 months ago

Also thank you for starting: 2023-12-30-mua-libxml2-two-f, for some reason the mua part does not work. The build logs do not indicate an error. Could you do a search for error logs in that run?

DonggeLiu commented 10 months ago

I see, so as preemptible is not provided in the call, I guess you are running the the dispatcher as non-preemptible?

Yep, that's a feature of Python. In the function definition, preemptible is bool = False means the default is False when not specified.

DonggeLiu commented 10 months ago

Also thank you for starting: 2023-12-30-mua-libxml2-two-f, for some reason the mua part does not work. The build logs do not indicate an error. Could you do a search for error logs in that run?

I saw many warning messages like the following. It's my first time seeing this message, did something consume all the space?

{
  "insertId": "1de7f84fio04oh",
  "jsonPayload": {
    "message": "Failed to run \"google_optimize_local_ssd\" script: exec: \"google_optimize_local_ssd\": executable file not found in $PATH",
    "localTimestamp": "2023-12-29T21:59:18.6696Z",
    "omitempty": null
  },
  "resource": {
    "type": "gce_instance",
    "labels": {
      "project_id": "fuzzbench",
      "zone": "us-central1-c",
      "instance_id": "1597173073505878006"
    }
  },
  "timestamp": "2023-12-29T21:59:18.669691061Z",
  "severity": "WARNING",
  "labels": {
    "instance_name": "d-2023-12-30-mua-libxml2-two-f"
  },
  "logName": "projects/fuzzbench/logs/GCEGuestAgent",
  "sourceLocation": {
    "file": "instance_setup.go",
    "line": "117",
    "function": "main.agentInit"
  },
  "receiveTimestamp": "2023-12-29T21:59:20.180046940Z"
}

No error was found:

BTW, there are a lot of these strange warns, not sure if they are related/helpful to your case:

It's strange because all messages are clearly from a different exp (e.g., r-2023-12-05-aflpp-2684587 must be an AFL++ experiment from 2023-12-05).

phi-go commented 10 months ago

I see, so as preemptible is not provided in the call, I guess you are running the the dispatcher as non-preemptible?

Yep, that's a feature of Python. In the function definition, preemptible is bool = False means the default is False when not specified.

Thank you for answering and your patience. Sorry for being unclear, I just wanted to make sure that we do not need to handle preemption as our code is also part of the dispatcher vm.

Also thank you for starting: 2023-12-30-mua-libxml2-two-f, for some reason the mua part does not work. The build logs do not indicate an error. Could you do a search for error logs in that run?

I saw many warning messages like the following. It's my first time seeing this message, did something consume all the space?
    "message": "Failed to run \"google_optimize_local_ssd\" script: exec: \"google_optimize_local_ssd\": executable file not found in $PATH",

I see that one as well but I see it happen as part of the startup script, before anything "real" is happening with the vm. Does it happen more often for you? Memory could be an issue but I would have hoped for an error log at least.

No error was found:

Well that makes things complicated ... Can you search for "mua container", "mua_build_ids", and "mua_run_mutants" as well, those are the interactions I would most likely expect an error with.

It's strange because all messages are clearly from a different exp (e.g., r-2023-12-05-aflpp-2684587 must be an AFL++ experiment from 2023-12-05).

Yeah, definitely strange, I have no idea how we could have influenced this, and these messages did not happen during our test runs. Does this only happen for experiments from this PR branch or main as well?

I have pushed another fix that should not be related but maybe we get a more insightful log if we run another experiment:

/gcbrun run_experiment.py -a --experiment-config /opt/fuzzbench/service/experiment-config.yaml --experiment-name 2023-12-30-mua-xml2-2f-bug --fuzzers afl libfuzzer --benchmarks libxml2_xml --mutation-analysis

phi-go commented 10 months ago

I have an idea what could be the problem. I noticed when something in runner.py crashes, not the fuzzer, that run behaves similar to what we can see in our problem run. Can you take a look at the runner logs?

DonggeLiu commented 10 months ago

/gcbrun run_experiment.py -a --experiment-config /opt/fuzzbench/service/experiment-config.yaml --experiment-name 2023-01-02-mua-xml2-2f-bug --fuzzers afl libfuzzer --benchmarks libxml2_xml --mutation-analysis

DonggeLiu commented 10 months ago

Experiment 2023-01-02-mua-xml2-2f-bug data and results will be available later at: The experiment data. The experiment report.

DonggeLiu commented 10 months ago

I have an idea what could be the problem. I noticed when something in runner.py crashes, not the fuzzer, that run behaves similar to what we can see in our problem run. Can you take a look at the runner logs?

Sure, I saw many repeated errors like the following.

1. Preempted but is not running Different from the strange ones above, this is r-2023-12-30-mua-libxml2-two-f-2708836, a runner from the experiment we were running (2023-12-30). There are similar messages for other runner instances, too. However, somehow the instance name is d-2023-12-29-no-mua-measurer. How can this report anything about the experiment on 2023-12-30?

{
  "insertId": "xaw0alfsugckp",
  "jsonPayload": {
    "instance_name": "d-2023-12-29-no-mua-measurer",
    "message": "Instance: r-2023-12-30-mua-libxml2-two-f-2708839 is preempted but is not running.",
    "component": "dispatcher",
    "experiment": "2023-12-29-no-mua-measurer"
  },
  "resource": {
    "type": "gce_instance",
    "labels": {
      "project_id": "fuzzbench",
      "zone": "projects/1097086166031/zones/us-central1-c",
      "instance_id": "8644718517907825376"
    }
  },
  "timestamp": "2023-12-30T13:23:05.886594464Z",
  "severity": "WARNING",
  "logName": "projects/fuzzbench/logs/fuzzbench",
  "receiveTimestamp": "2023-12-30T13:23:05.886594464Z"
}

2. Failed to run google_optimize_local_ssd,google_set_multiqueue Like the ones in earlier comments. I put their log entries together because they are similar.

{
  "insertId": "14simwaf6cn7mw",
  "jsonPayload": {
    "localTimestamp": "2023-12-30T10:04:24.6398Z",
    "omitempty": null,
    "message": "Failed to run \"google_optimize_local_ssd\" script: exec: \"google_optimize_local_ssd\": executable file not found in $PATH"
  },
  "resource": {
    "type": "gce_instance",
    "labels": {
      "instance_id": "7700821883207439863",
      "zone": "us-central1-c",
      "project_id": "fuzzbench"
    }
  },
  "timestamp": "2023-12-30T10:04:24.639981903Z",
  "severity": "WARNING",
  "labels": {
    "instance_name": "r-2023-12-30-mua-libxml2-two-f-2709035"
  },
  "logName": "projects/fuzzbench/logs/GCEGuestAgent",
  "sourceLocation": {
    "file": "instance_setup.go",
    "line": "117",
    "function": "main.agentInit"
  },
  "receiveTimestamp": "2023-12-30T10:04:28.090087564Z"
}

{
  "insertId": "14simwaf6cn7mx",
  "jsonPayload": {
    "localTimestamp": "2023-12-30T10:04:24.6400Z",
    "omitempty": null,
    "message": "Failed to run \"google_set_multiqueue\" script: exec: \"google_set_multiqueue\": executable file not found in $PATH"
  },
  "resource": {
    "type": "gce_instance",
    "labels": {
      "zone": "us-central1-c",
      "project_id": "fuzzbench",
      "instance_id": "7700821883207439863"
    }
  },
  "timestamp": "2023-12-30T10:04:24.640440231Z",
  "severity": "WARNING",
  "labels": {
    "instance_name": "r-2023-12-30-mua-libxml2-two-f-2709035"
  },
  "logName": "projects/fuzzbench/logs/GCEGuestAgent",
  "sourceLocation": {
    "file": "instance_setup.go",
    "line": "117",
    "function": "main.agentInit"
  },
  "receiveTimestamp": "2023-12-30T10:04:28.090087564Z"
}

3. No snapshot data A lot of these, not sure if they are expected though.

{
  "insertId": "289kvffbspttu",
  "jsonPayload": {
    "experiment": "2023-12-30-mua-libxml2-two-f",
    "component": "dispatcher",
    "message": "No snapshot data.",
    "instance_name": "d-2023-12-30-mua-libxml2-two-f"
  },
  "resource": {
    "type": "gce_instance",
    "labels": {
      "zone": "projects/1097086166031/zones/us-central1-c",
      "project_id": "fuzzbench",
      "instance_id": "1597173073505878006"
    }
  },
  "timestamp": "2023-12-30T09:32:24.138849121Z",
  "severity": "WARNING",
  "logName": "projects/fuzzbench/logs/fuzzbench",
  "receiveTimestamp": "2023-12-30T09:32:24.138849121Z"
}

There are also many logging errors. I don't think I have seen them before either, but I feel they are probably from gcloud, not your code. Pasting a sample here for completeness.

{
  "insertId": "qx14yheiqysz",
  "jsonPayload": {
    "component": "dispatcher",
    "message": "Retrying on common.logs.log failed with 400 Log entry with size 2.28M exceeds maximum size of 256.0K. Raise.",
    "subcomponent": "measurer",
    "traceback": "Traceback (most recent call last):\n  File \"/usr/local/lib/python3.10/site-packages/google/api_core/grpc_helpers.py\", line 65, in error_remapped_callable\n    return callable_(*args, **kwargs)\n  File \"/usr/local/lib/python3.10/site-packages/grpc/_channel.py\", line 1030, in __call__\n    return _end_unary_response_blocking(state, call, False, None)\n  File \"/usr/local/lib/python3.10/site-packages/grpc/_channel.py\", line 910, in _end_unary_response_blocking\n    raise _InactiveRpcError(state)  # pytype: disable=not-instantiable\ngrpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:\n\tstatus = StatusCode.INVALID_ARGUMENT\n\tdetails = \"Log entry with size 2.28M exceeds maximum size of 256.0K\"\n\tdebug_error_string = \"UNKNOWN:Error received from peer ipv4:216.239.38.174:443 {created_time:\"2024-01-01T19:15:59.905002192+00:00\", grpc_status:3, grpc_message:\"Log entry with size 2.28M exceeds maximum size of 256.0K\"}\"\n>\n\nThe above exception was the direct cause of the following exception:\n\nTraceback (most recent call last):\n  File \"/work/src/common/retry.py\", line 81, in _wrapper\n    result = func(*args, **kwargs)\n  File \"/work/src/common/logs.py\", line 187, in log\n    logger.log_struct(struct_message, severity=severity)\n  File \"/usr/local/lib/python3.10/site-packages/google/cloud/logging_v2/logger.py\", line 209, in log_struct\n    self._do_log(client, StructEntry, info, **kw)\n  File \"/usr/local/lib/python3.10/site-packages/google/cloud/logging_v2/logger.py\", line 158, in _do_log\n    client.logging_api.write_entries([api_repr])\n  File \"/usr/local/lib/python3.10/site-packages/google/cloud/logging_v2/_gapic.py\", line 160, in write_entries\n    self._gapic_api.write_log_entries(request=request)\n  File \"/usr/local/lib/python3.10/site-packages/google/cloud/logging_v2/services/logging_service_v2/client.py\", line 706, in write_log_entries\n    response = rpc(\n  File \"/usr/local/lib/python3.10/site-packages/google/api_core/gapic_v1/method.py\", line 113, in __call__\n    return wrapped_func(*args, **kwargs)\n  File \"/usr/local/lib/python3.10/site-packages/google/api_core/retry.py\", line 349, in retry_wrapped_func\n    return retry_target(\n  File \"/usr/local/lib/python3.10/site-packages/google/api_core/retry.py\", line 191, in retry_target\n    return target()\n  File \"/usr/local/lib/python3.10/site-packages/google/api_core/grpc_helpers.py\", line 67, in error_remapped_callable\n    raise exceptions.from_grpc_error(exc) from exc\ngoogle.api_core.exceptions.InvalidArgument: 400 Log entry with size 2.28M exceeds maximum size of 256.0K\n",
    "experiment": "2023-12-30-mua-libxml2-two-f",
    "instance_name": "d-2023-12-30-mua-libxml2-two-f"
  },
  "resource": {
    "type": "gce_instance",
    "labels": {
      "instance_id": "1597173073505878006",
      "zone": "projects/1097086166031/zones/us-central1-c",
      "project_id": "fuzzbench"
    }
  },
  "timestamp": "2024-01-01T19:16:00.047192290Z",
  "severity": "ERROR",
  "logName": "projects/fuzzbench/logs/fuzzbench",
  "receiveTimestamp": "2024-01-01T19:16:00.047192290Z"
}

We can use the result/log of 2023-01-02-mua-xml2-2f-bug to check if some of the errors above are flaky / from gcloud.

DonggeLiu commented 10 months ago

BTW, if you like, we can schedule a time to look at the logs together.

google / fuzzbench

Add new measurer based on Mutation Analysis #1901