google / clusterfuzz

Scalable fuzzing infrastructure.
https://google.github.io/clusterfuzz
Apache License 2.0
5.27k stars 551 forks source link

AFL/AFL++ issues: please post here #2306

Open vanhauser-thc opened 3 years ago

vanhauser-thc commented 3 years ago

as I am not a Google dev, but try to take care afl++ works, please put issues with afl/afl++ here, reference a dedicated issue here or mention @vanhauser-thc in your issue.

@jonathanmetzman @inferno-chromium can we pin this issue for the next 6 weeks?

zounathan commented 3 years ago

Some details about the issue3 in https://github.com/google/clusterfuzz/issues/2302. It seems that the AFL++'s crashes will be all ignored, because the runner.fuzzer_stderr is always None.

src\python\bot\fuzzers\afl\engine.py#L108

    if os.path.exists(testcase_file_path):
      crash = engine.Crash(testcase_file_path, runner.fuzzer_stderr, [],
                           fuzz_result.time_executed)
      crashes.append(crash)

class Crash(object):
  """Represents a crash found by the fuzzing engine."""

  def __init__(self, input_path, stacktrace, reproduce_args, crash_time):
    self.input_path = input_path
    self.stacktrace = stacktrace
    self.reproduce_args = reproduce_args
    self.crash_time = crash_time
  def fuzzer_stderr(self):
    """Returns the stderr of the fuzzer. Reads it first if it wasn't already
    read. Because ClusterFuzz terminates this process after seeing a stacktrace
    printed, make sure that printing this property is the last code a program
    expects to execute.
    """
    if self._fuzzer_stderr is not None:
      return self._fuzzer_stderr

    try:
      with open(self.stderr_file_path, 'rb') as file_handle:
        stderr_data = utils.decode_to_unicode(
            utils.read_from_handle_truncated(file_handle, MAX_OUTPUT_LEN))

      self._fuzzer_stderr = get_first_stacktrace(stderr_data)
    except IOError:
      self._fuzzer_stderr = ''
    return self._fuzzer_stderr

In function do_engine_fuzzing, the crashes will be adjusted by function from_engine_crash. Because the runner.fuzzer_stderr is None, crash_state and crash_type are also None.

  def from_engine_crash(cls, crash, fuzzing_strategies):
    """Create a Crash from a engine.Crash."""
    return Crash(
        file_path=crash.input_path,
        crash_time=crash.crash_time,
        return_code=1,
        resource_list=[],
        gestures=[],
        unsymbolized_crash_stacktrace=utils.decode_to_unicode(crash.stacktrace),
        arguments=' '.join(crash.reproduce_args),
        application_command_line='',  # TODO(ochang): Write actual command line.
        fuzzing_strategies=fuzzing_strategies)

class Crash(object):
  """Represents a crash (before creating a testcase)."""

  def __init__(self,
               file_path,
               crash_time,
               return_code,
               resource_list,
               gestures,
               unsymbolized_crash_stacktrace,
               arguments,
               application_command_line,
               http_flag=False,
               fuzzing_strategies=None):
    self.file_path = file_path
    self.crash_time = crash_time
    self.return_code = return_code
    self.resource_list = resource_list
    self.gestures = gestures
    self.arguments = arguments
    self.fuzzing_strategies = fuzzing_strategies

    self.security_flag = False
    self.should_be_ignored = False

    self.filename = os.path.basename(file_path)
    self.http_flag = http_flag
    self.application_command_line = application_command_line
    self.unsymbolized_crash_stacktrace = unsymbolized_crash_stacktrace
    state = stack_analyzer.get_crash_data(self.unsymbolized_crash_stacktrace)
    self.crash_type = state.crash_type
    self.crash_address = state.crash_address
    self.crash_state = state.crash_state
    self.crash_stacktrace = utils.get_crash_stacktrace_output(
        self.application_command_line, state.crash_stacktrace,
        self.unsymbolized_crash_stacktrace)
    self.security_flag = crash_analyzer.is_security_issue(
        self.unsymbolized_crash_stacktrace, self.crash_type, self.crash_address)
    self.key = '%s,%s,%s' % (self.crash_type, self.crash_state,
                             self.security_flag)
    self.should_be_ignored = crash_analyzer.ignore_stacktrace(
        state.crash_stacktrace)

    # self.crash_info gets populated in create_testcase; save what we need.
    self.crash_frames = state.frames
    self.crash_info = None

After function do_engine_fuzzing is compeleted, It will call function process_crashes. Because crash state and type are empty, the crash is ignored.

process_crashes -> filter_crashes -> is_valid -> get_error

def get_error(self):
    """Return the reason why the crash is invalid."""
    filter_functional_bugs = environment.get_value('FILTER_FUNCTIONAL_BUGS')
    if filter_functional_bugs and not self.security_flag:
      return 'Functional crash is ignored: %s' % self.crash_state

    if self.should_be_ignored:
      return ('False crash: %s\n\n---%s\n\n---%s' %
              (self.crash_state, self.unsymbolized_crash_stacktrace,
               self.crash_stacktrace))

    if self.is_archived() and not self.fuzzed_key:
      return 'Unable to store testcase in blobstore: %s' % self.crash_state

    if not self.crash_state or not self.crash_type:
      return 'Empty crash state or type'

    return None
2021-04-12 13:21:37,983 - run_bot - INFO - Finished processing test cases.
2021-04-12 13:21:37,983 - run_bot - INFO - Raw crash count: 2
2021-04-12 13:21:37,989 - run_bot - INFO - Ignore crash (reason=Empty crash state or type, type=, state=).
2021-04-12 13:21:37,989 - run_bot - INFO - Ignore crash (reason=Empty crash state or type, type=, state=).
2021-04-12 13:21:37,989 - run_bot - INFO - Finished processing crashes.
2021-04-12 13:21:37,989 - run_bot - INFO - New crashes: 0, known crashes: 0, processed groups: []
vanhauser-thc commented 3 years ago

@zounathan thanks I had overlooked the 3rd issue when I read it on my small phone :)

@jonathanmetzman AFAIK this is nothing we changed so I am clueless how this ever worked with vanilla afl. I am unsure how to fix this as I do not really understand the full code. to detect a crash it is enough to check the return code of afl-showmap - if it is larger than 1 then it is a crash (0 = ok, 1 = timeout, 2 or 3 = crash). if you want to trigger or parse on ASAN output then of course stderr may not be None obviously.

zounathan commented 3 years ago

It detects a crash through the output of afl++. If there are crash files, It eill copy crash files to self.testcase_file_path in function run_afl_fuzz.

      # Attempt to start the fuzzer.
      fuzz_result = self.run_and_wait(
          additional_args=fuzz_args,
          timeout=max_total_time,
          terminate_before_kill=True,
          terminate_wait_time=self.SIGTERM_WAIT_TIME)

      # Reduce max_total_time by the amount of time the last attempt took.
      max_total_time -= fuzz_result.time_executed

      # Break now only if everything went well. Note that if afl finds a crash
      # from fuzzing (and not in the input) it will exit with a zero return
      # code.
     if fuzz_result.return_code == 0:
        # If afl-fuzz found a crash, copy it to the testcase_file_path.
        self.afl_output.copy_crash_if_needed(self.testcase_file_path)
        break

      # Else the return_code was not 0 so something didn't work out. Try fixing
      # this if afl-fuzz threw an error because it saw a crash, hang or large
      # file in the starting corpus.

      # If there was a crash in the input/corpus, afl-fuzz won't run, so let
      # ClusterFuzz know about this and quit.
      crash_filename = check_error_and_log(self.CRASH_REGEX,
                                           self.CRASH_LOG_MESSAGE)

      if crash_filename:
        crash_path = os.path.join(self.afl_input.input_directory,
                                  crash_filename)

        # Copy this file over so afl can reproduce the crash.
        shutil.copyfile(crash_path, self.testcase_file_path)
        break

Function fuzz in file src\python\bot\fuzzers\afl\engine.py checks testcase_file_path, and construct the crash.

    if os.path.exists(testcase_file_path):
      crash = engine.Crash(testcase_file_path, runner.fuzzer_stderr, [],
                           fuzz_result.time_executed)
      crashes.append(crash)
vanhauser-thc commented 3 years ago

It detects a crash through the output of afl++. If there are crash files, It eill copy crash files to self.testcase_file_path in function run_afl_fuzz.

ah! yeah that cannot work anymore. afl++ will still work if crashing items are in the corpus. it uses them for splicing mutations.

@jonathanmetzman I could add an env var to abort afl-fuzz if crashing inputs are detected. that would be the least hassle. WDYT?

EDIT - I added an env var to dev to exit on crashing + timeout inputs in the seed corpus. but if you want this you have to add all the tests for crashes back that I removed during the integration ...

jonathanmetzman commented 3 years ago

@jonathanmetzman @inferno-chromium can we pin this issue for the next 6 weeks?

Done.

@zounathan thanks for this report. But I think the idea that runner.fuzzer_stderr is always None is wrong.

First, we have crashes found by AFL https://bugs.chromium.org/p/oss-fuzz/issues/list?q=label%3AEngine-AFL&sort=-reported so it's hard for me to believe that crashes are never reported. I'll try out this example to be sure. Also, I noticed you were using backslashes to seperate files. Just checking, you aren't running this on Windows are you?

zounathan commented 3 years ago

@jonathanmetzman no, I run on ubuntu18.04.

But I think the idea that runner.fuzzer_stderr is always None is wrong.

Yep, I find it will get the stderr_file_path from env AFL_DRIVER_STDERR_DUPLICATE_FILENAME and reopen. The weird thing is that the crash information isn't written to the reopened stderr. If i add a fprintf, the content can be written to the file.

Fuzzing with AFL-master, the reopened stderr has content of error information. But AFL++ doesn't has any output.

zounathan commented 3 years ago

vlun1

char *p = 0;
*p = 0x1;

vlun2

char m[10]={0};
m[11]=0x1;

In case of vlun1, it will raise a SIGSEGV signal. AFL++ catches SIGSEGV signal, and kill the child process, that leads to the error information of ASAN doesn't output. But AFL catches SIGABRT signal, which is rasied by ASAN after the error information output(I guess). In case of vuln2, AFL++ and AFL botch catch SIGABRT signal. And they both have the error output.

zounathan commented 3 years ago

I find that the ASAN and MSAN OPTIONS env cause this issue. AFL's env

      setenv("ASAN_OPTIONS", "abort_on_error=1:"
                             "detect_leaks=0:"
                             "symbolize=0:"
                             "allocator_may_return_null=1", 0);

      setenv("MSAN_OPTIONS", "exit_code=" STRINGIFY(MSAN_ERROR) ":"
                             "symbolize=0:"
                             "msan_track_origins=0", 0);

AFL++'s env

    if (!getenv("ASAN_OPTIONS"))
      setenv("ASAN_OPTIONS",
             "abort_on_error=1:"
             "detect_leaks=0:"
             "malloc_context_size=0:"
             "symbolize=0:"
             "allocator_may_return_null=1:"
             "detect_odr_violation=0:"
             "handle_segv=0:"
             "handle_sigbus=0:"
             "handle_abort=0:"
             "handle_sigfpe=0:"
             "handle_sigill=0",
             1);

    /* Set sane defaults for UBSAN if nothing else specified. */

    if (!getenv("UBSAN_OPTIONS"))
      setenv("UBSAN_OPTIONS",
             "halt_on_error=1:"
             "abort_on_error=1:"
             "malloc_context_size=0:"
             "allocator_may_return_null=1:"
             "symbolize=0:"
             "handle_segv=0:"
             "handle_sigbus=0:"
             "handle_abort=0:"
             "handle_sigfpe=0:"
             "handle_sigill=0",
             1);

    /* Envs for QASan */
    setenv("QASAN_MAX_CALL_STACK", "0", 0);
    setenv("QASAN_SYMBOLIZE", "0", 0);

    /* MSAN is tricky, because it doesn't support abort_on_error=1 at this
       point. So, we do this in a very hacky way. */

    if (!getenv("MSAN_OPTIONS"))
      setenv("MSAN_OPTIONS",
           "exit_code=" STRINGIFY(MSAN_ERROR) ":"
           "symbolize=0:"
           "abort_on_error=1:"
           "malloc_context_size=0:"
           "allocator_may_return_null=1:"
           "msan_track_origins=0:"
           "handle_segv=0:"
           "handle_sigbus=0:"
           "handle_abort=0:"
           "handle_sigfpe=0:"
           "handle_sigill=0",
           1);
vanhauser-thc commented 3 years ago

you mean it is because of handle_segv=0 ? we do that in afl++ because we already detect the segfault and asan does not provide as with anything we need - beside being an overhead. if the original afl behvaiour is however wanted that is trivial to get, just set ASAN_OPTIONS as it is needed - just symbolize must be 0 and abort_on_error 1.

zounathan commented 3 years ago

Yes, with the Afl's env, Afl++ can catch the sigabrt, and the output is correct.

you mean it is because of handle_segv=0 ? we do that in afl++ because we already detect the segfault and asan does not provide as with anything we need - beside being an overhead. if the original afl behvaiour is however wanted that is trivial do get, just set ASAN_OPTIONS as it is needed - just symbolize must be 0 and abort_on_error 1.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/google/clusterfuzz/issues/2306#issuecomment-820323267, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AGZ3DKWAY6PSTUEYIWSJWXDTI2663ANCNFSM422437SQ.

zounathan commented 3 years ago

The AFL FUZZ with -S option. But there is no main node. How does AFL sync the fuzzing statistics?

The warning:

[!] WARNING: no -M main node found. It is recommended to run exactly one main instance.
vanhauser-thc commented 3 years ago

if there is no -M main, but there is 2+ -S then one of the secondaries will perform the syncing between all the nodes. a -M node has the advantage of doing deterministic fuzzing (not that effective but one instance in a fuzzing campaign should do that) plus it automatically sets to not trimming queue entries.

zounathan commented 3 years ago

if there is no -M main, but there is 2+ -S then one of the secondaries will perform the syncing between all the nodes. a -M node has the advantage of doing deterministic fuzzing (not that effective but one instance in a fuzzing campaign should do that) plus it automatically sets to not trimming queue entries.

If i run 2 bots on 2 different vms, can the afl perfrom the syncing?

vanhauser-thc commented 3 years ago

afl has no networking support (and it would just slow it down, plus security). the easiest is to rsync one instance from each host to the other (there may be no naming conflict), e.g. every 4h or so. if the VMs are on the same host you could put the afl on a shared drive, then you can use afl's internal syncing.

jonathanmetzman commented 3 years ago

Sorry for the delayed reply here, but I don't really see what's the issue. I haven't run this on CF yet. But locally AFL++'s behavior with ASAN appears fine to me. Here are some commands I ran that show AFL++ is confirming to the behavior I think ClusterFuzz expects:

rm output; AFL_BENCH_UNTIL_CRASH=1 AFL_DRIVER_STDERR_DUPLICATE_FILENAME=output AFL_SKIP_CPUFREQ=1 ASAN_OPTIONS=symbolize=0:abort_on_error=1  /src/aflplusplus/afl-fuzz -i /tmp/i  -o /tmp/o -mnone ./handshake-fuzzer 10000 && cat output
...AFL OUTPUT REDACTED...
=================================================================
==5609==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x629000009748 at pc 0x000000498137 bp 0x7ffc698ac7b0 sp 0x7ffc698abf78
READ of size 65533 at 0x629000009748 thread T0
    #0 0x498136  (/src/heartbleed/handshake-fuzzer+0x498136)
...

Again with setting ASAN_OPTIONS:

rm output; ASAN_OPTIONS=symbolize=0:abort_on_error=1 AFL_BENCH_UNTIL_CRASH=1 AFL_DRIVER_STDERR_DUPLICATE_FILENAME=output AFL_SKIP_CPUFREQ=1 ASAN_OPTIONS=symbolize=0:abort_on_error=1  /src/aflplusplus/afl-fuzz -i /tmp/i  -o /tmp/o -mnone ./handshake-fuzzer 10000 && cat output
...AFL OUTPUT REDACTED...
=================================================================
==5630==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x629000009748 at pc 0x000000498137 bp 0x7ffcc699a1b0 sp 0x7ffcc6999978
READ of size 64003 at 0x629000009748 thread T0
    #0 0x498136  (/src/heartbleed/handshake-fuzzer+0x498136)
vanhauser-thc commented 3 years ago

@jonathanmetzman in both examples you set ASAN_OPTIONS :) you would need NOT to set them in one example :)

vanhauser-thc commented 3 years ago

@zounathan afl-fuzz works with both, handle{segv,abort,..}=0 and handle[...}=1: ... if anything in clusterfuzz results in needed setting these to 1 that would be easy to do. But can you please create an easy cut+paste example source + compile + afl-fuzz command where a asan crash is not picked up by afl-fuzz?

zounathan commented 3 years ago

Here is my demo demo.cc

#include <stddef.h>
#include <stdint.h>
int vuln(const uint8_t *data){
        if(data[0]=='a'){
                if(data[1]=='f'){
                        char *p=0;
                        *p=0x12;
                }
                if(data[1]=='l'){
                        int x=3;
                        x=x/0;
                }
                if(data[1]=='c'){
                        char d[10];
                        d[20]=0x34;
                }
        }
        return 0;
}

extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
    vuln(data);
    return 0;
}

I compile with the following command.

clang++ -fsanitize-coverage=trace-pc-guard -fsanitize=address -o demo demo.cc ../../libAFLDriver.a ../../afl-compiler-rt-64.o

If I run the demo with input cu to trigger the buffer overflow, afl++ indeed can output the correct ASAN output.

AFL_DRIVER_STDERR_DUPLICATE_FILENAME=/home/nathan/study/AFLplusplus-stable/test/demo/error_out ../../afl-fuzz -i input/ -o output ./demo

nathan@nathan-VirtualBox:~/study/AFLplusplus-stable/test/demo$ cat error_out
=================================================================
==17014==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffc5b684d14 at pc 0x000000517571 bp 0x7ffc5b684cd0 sp 0x7ffc5b684cc8

But if i run the demo with input cf or cl to trigger the segment fault or Floating point exception, afl++ can only catch the fault. The output of ASAN is none.

AFL_DRIVER_STDERR_DUPLICATE_FILENAME=/home/nathan/study/AFLplusplus-stable/test/demo/error_out ../../afl-fuzz -i input/ -o output ./demo

nathan@nathan-VirtualBox:~/study/AFLplusplus-stable/test/demo$ ls -al error_out
-rw-rw-r-- 1 nathan nathan 0 5月   1 10:11 error_out
nathan@nathan-VirtualBox:~/study/AFLplusplus-stable/test/demo$ cat error_out
nathan@nathan-VirtualBox:~/study/AFLplusplus-stable/test/demo$

If i change the handle_{segv,abort,..} to 0, the afl++ can behave correctly with ASAN. Fuzz with input cf

nathan@nathan-VirtualBox:~/study/AFLplusplus-stable/test/demo$ cat error_out
AddressSanitizer:DEADLYSIGNAL
=================================================================
==17740==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x0000005173f8 bp 0x7ffecff3fa20 sp 0x7ffecff3f8e0 T0)
==17740==The signal is caused by a WRITE memory access.
==17740==Hint: address points to the zero page.

Fuzz with input cu

nathan@nathan-VirtualBox:~/study/AFLplusplus-stable/test/demo$cat error_out
AddressSanitizer:DEADLYSIGNAL
=================================================================
==17719==ERROR: AddressSanitizer: FPE on unknown address 0x000000517495 (pc 0x000000517495 bp 0x7ffeba7f8150 sp 0x7ffeba7f8020 T0)
vanhauser-thc commented 3 years ago

@zounathan I cannot reproduce your issue. Note that the afl way of compiling the target would be: AFL_USE_ASAN=1 afl-clang-fast++ -o t -fsanitize=fuzzer demo.cc - though because of your code you will have to add -O0 as default is -O3 which will optimize out your bugs.

when I run afl-fuzz with that produced binary it detects all 3 crash types.

but also with your compile command afl-fuzz finds all 3 crash types.

zounathan commented 3 years ago

AFL-fuzz can detect all 3 crash types definitely, but the output of ASAN is not correct now. Clusterfuzz will ignore crashes if the output of ASAN is not right. Just as I replied before, when the input is cu, the buffer overflow will be detected by AFL-fuzz and make the right output by ASAN. But if the fuzz with input cf or cu, the segment fault or floating point exception will be detected by AFL also, but the output will be null. Clusterfuzz will ignore these two crashes cause the output of ASAN is null in this situation.

vanhauser-thc commented 3 years ago

@zounathan yes what you see is actually the behaviour I want to have in the fuzzer. if clusterfuzz would depend on that collected asan stderr output while afl-fuzz is running - this would be a wrong approach how to do it.

@jonathanmetzman how does clusterfuzz detect and process crashes? I would assume that out/default/crashes/id:* files would be pulled and assessed.

zounathan commented 3 years ago

Clusterfuzz can also detect the 3 crashes. But because the output of Asan is null, the segment fault and FPF will be ignored, and record in the log. Only the buffer overflow can be handled correctly.

@zounathanhttps://github.com/zounathan yes what you see is actually the behaviour I want to have in the fuzzer. if clusterfuzz would depend on that collected asan stderr output while afl-fuzz is running - this would be a wrong approach how to do it.

@jonathanmetzmanhttps://github.com/jonathanmetzman how does clusterfuzz detect and process crashes? I would assume that out/default/crashes/id:* files would be pulled and assessed.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/google/clusterfuzz/issues/2306#issuecomment-830696777, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AGZ3DKX6IY4NOU4SLGDKN4DTLRWWNANCNFSM422437SQ.

jonathanmetzman commented 3 years ago

@jonathanmetzman in both examples you set ASAN_OPTIONS :) you would need NOT to set them in one example :)

@zounathan yes what you see is actually the behaviour I want to have in the fuzzer. if clusterfuzz would depend on that collected asan stderr output while afl-fuzz is running - this would be a wrong approach how to do it.

@jonathanmetzman how does clusterfuzz detect and process crashes? I would assume that out/default/crashes/id:* files would be pulled and assessed.

Yes ClusterFuzz does depend on the ASAN stacktrace. Without it, ClusterFuzz may ignore crashes. Sorry for the misunderstanding @zounathan but this is an extremely important issue you brought to our attention.

jonathanmetzman commented 3 years ago

Reviewing this further, I realize I don't have any idea what's going on here. Because I am seeing AFL++ producing ASAN traces on CF. I'm going to investigate this thoroughly on Friday. Can't do it now, too busy.

inferno-chromium commented 3 years ago

in ClusterFuzz case, all handle_{segv,abort,..} are set to 1, so both ASAN_OPTIONS and UBSAN_OPTIONS are set already, so we shouldnt be running on that code path.

CarpeDiem-CarpeNoctem commented 3 years ago

Is there any reason why we can´t use afl-tmin for corpus pruning as well in clusterfuzz? According to docs its not supported (last paragraph in: https://google.github.io/clusterfuzz/setting-up-fuzzing/libfuzzer-and-afl/)

amuthuirulappa commented 2 years ago

handle_*=0 workaround didnt work for me. I tried to make a hacky patch of engine.py that appeared to work fine and found crashes that were previously missed. clusterfuzz\src\clusterfuzz_internal\bot\fuzzers\afl\engine.py:118 -

if os.path.exists(testcase_file_path):
      res=self.reproduce(target_path, testcase_file_path,None,None)
      crash = engine.Crash(testcase_file_path, res.output, [],
                           fuzz_result.time_executed)