Local experiments timeouts

andreafioraldi commented 3 years ago

I'm running some local bug experiments and I noticed that fuzzbench reports some bugs while AFL++ didn't find any crashes (looking at results/fuzzer-log.txt).

What is the explanation for that? Aren't you looking at the output folder of AFL++ for crashes?

inferno-chromium commented 3 years ago

It does a libfuzzer merge run on entire corpus dir (with crashes subdir) from cloud storage (from that trial run)- https://github.com/google/fuzzbench/blob/78947df34aff867e97a64c89879f3fb37724e506/experiment/measurer/run_coverage.py#L52 and then it captures crashes from that. then it processes them one-by-one for signature/crash params - https://github.com/google/fuzzbench/blob/master/experiment/measurer/measure_manager.py#L512

andreafioraldi commented 3 years ago

Ok so it is possible that a fuzzer doesn't detect a bug but it is reported in the report? You should check crashes only in the crashes/ folder IMHO, but good to know this behavior because a was observing an empty crashes folder for a fuzzer without ASan but bugs reported in the report and I was confused.

andreafioraldi commented 3 years ago

I still don't get why in the report I have 0 bugs for an AFL++ based fuzzer but when I look at results/fuzzer-log.txt there are at least 30 crashes. Now it is possible that all these 30 crashes are duplicated, but then FuzzBench should report at least one bug for that fuzzer.

andreafioraldi commented 3 years ago

AFL++ stores crashes in OUTPUT/default/crashes, is it possible that you are not executing the testcases in this dir when measuring crashes, but only from the queue?

andreafioraldi commented 3 years ago

From what I'm understanding (correct me please) you extract the corpus tar.gz, the for AFL++ contains a directory struct of type corpus/default/{queue, crashes, hangs}, here, then you run libfuzzer merge here. This is ok as you said it executes the crashes subdir too, but I'm perplex about the mem limit and the timeout.

I'm hitting this problem on aspell, that seems a slow target (there are a lot of hanging testcases in AFL++) and so there is the possibility that libfuzzer is considering the crashing inputs found by AFL++ as timeouts. I'm right?

andreafioraldi commented 3 years ago

I'm going to increase the mem limit and the timeout and update you then, my suggestion is also to exclude the hangs subdir to speedup the merge process.

andreafioraldi commented 3 years ago

This seems a bug because it does not recurse the subdirs (I guess is should be os.walk instead of os.listdir)

andreafioraldi commented 3 years ago

I think that every snashot you are executing again the merge over the entire corpus and so the MAX_TOTAL_TIME timeout on merge stops libfuzzer https://github.com/google/fuzzbench/blob/d8d1a982463057b446c9dd9f1e6ecd8853e62f44/experiment/measurer/run_coverage.py#L63

I'm hitting this problem on my new AFL++ configuration because aspell is slow and I'm generating more testcases in the queue compared to normal AFL++. But this IMO this is a bug of Fuzzbench.

jonathanmetzman commented 3 years ago

This seems a bug because it does not recurse the subdirs (I guess is should be os.walk instead of os.listdir)

I don't think this is a bug. We extract the corpus zip into that directory here. If you look at the definition of extract_corpus to me it looks like we do extract all of the files, regardless of subdirectories.

So I'd guess the reason why Fuzzbench doesn't recognize crashes as such is because of the max_total_time.

andreafioraldi commented 3 years ago

but os.listdir(self.corpus_dir) will always return ["default"] with AFL++. The issue here (or at least seems an issue to me) is that the measured files blacklist is not working. Here you always get a blacklist ["default"]. This means that, at every snapshot, libfuzzer executes the entire corpus every time, instead of just the newly generated files. The issue is a concern only when the target is slow (and maybe not a concern for you because your CPUs are faster) but this what IMO causes the max_total_time expiration.

I really don't know how to debug this code so maybe I'm wrong as I'm not super into python (and docker).

andreafioraldi commented 3 years ago

In my fork I tried to fix it in this way.

    def update_measured_files(self):
        """Updates the measured-files.txt file for this trial with
        files measured in this snapshot."""
        #current_files = set(os.listdir(self.corpus_dir))
        current_files = set()
        for dir_path, _, files in os.walk(self.corpus_dir):
            for filename in map(lambda x: os.path.join(dir_path, x), files):
                current_files.add(filename)
        already_measured = self.get_measured_files()
        filesystem.write(self.measured_files_path,
                         '\n'.join(current_files.union(already_measured)))

But there are still timeouts (10 seconds is too low) and there is now the problem that if the measurer is killed due to the max_total_time expiration now testcases that are not processed by libfuzzer are anyway marked as already measured.

jonathanmetzman commented 3 years ago

but os.listdir(self.corpus_dir) will always return ["default"] with AFL++. The issue here (or at least seems an issue to me) is that the measured files blacklist is not working. Here you always get a blacklist ["default"]. This means that, at every snapshot, libfuzzer executes the entire corpus every time, instead of just the newly generated files. The issue is a concern only when the target is slow (and maybe not a concern for you because your CPUs are faster) but this what IMO causes the max_total_time expiration.

I really don't know how to debug this code so maybe I'm wrong as I'm not super into python (and docker).

Sorry I wasn't clear about how extract_corpus works. But I don't think the issue you are raising actually exists. I've prepared an example that I think shows this. Basically the way extract_corpus works for each file in the corpus archive:

extracts every file from the corpus
computes the shasum
Saves the file using the sha as the name to output_directory

So it doesn't matter if there are subdirectories in corpus archive, output_directory shouldn't have any.

Here's an example to convince yourself. Download corpus.tar.gz and then run this command from fuzzbench root:

source .venv/bin/activate; 
rm -rf /tmp/outputdir; 
PYTHONPATH=. python -c "from experiment.measurer import measure_manager; measure_manager.extract_corpus('../corpus.tar.gz', set(), '/tmp/outputdir')"; 
ls /tmp/outputdir

Even though corpus.tar.gz had files in subdirectories, you will see that /tmp/outputdir contains just those two files at the toplevel. No subdirectories. So using os.listdir should work on the output of extract_corpus.

andreafioraldi commented 3 years ago

Ah ok got it! So there's nothing that I can do apart from increasing the timeout and hoping that it is enough

inferno-chromium commented 3 years ago

Ah ok got it! So there's nothing that I can do apart from increasing the timeout and hoping that it is enough

that timeout is 15 min, i feel if we are hitting it, then it is all those crazy hangs and stuff. we can increase it, but how much do you need.

Should we not archive hangs dir (afl variants), timeout-* files as you were suggesting in corpus (that excludes from measuring, crash analysis) ? can you do a PR for that ? @jonathanmetzman thoughts, do we need coverage from hangs?

andreafioraldi commented 3 years ago

UNIT_TIMEOUT is also hitted, btw I'll play a bit with these parameters locally and then eventually write a PR for upstream if I'll find a general solution

inferno-chromium commented 3 years ago

UNIT_TIMEOUT is also hitted, btw I'll play a bit with these parameters locally and then eventually write a PR for upstream if I'll find a general solution

also try adding that hangs dir in this place too - https://github.com/google/fuzzbench/blob/8149d744a4965c216501a0ab612f711f850563e7/experiment/runner.py

we welcome your PRs, sorry a lot is happening on our side, so cycles thin atm.

google / fuzzbench

Local experiments timeouts #1108