kcc commented 5 years ago

(consider this as a work-in-progress design doc, it will be periodically updated) EDIT 2019-06-20

Data Flow Trace

The Data Flow Trace (DFT) tells the fuzzing engine which bytes of a given input affect which comparison instructions. In the following example, if an input reaches CMP1, DFT will tell us that CMP1 is affected by data[55], data[66] and data[77].

int LLVMFuzzerTestOneInput(const unsigned char *data, size_t size) {
  int x = SomeFunctionOf(data[55], data[66]);
  ...
  if(x == data[77]) // CMP1
    ...
}

DataFlowSanitizer (DFSan) allows us to collect byte-precise DFT, typically at the cost of several executions of a given input.

Collecting the DFT

In order to collect the DFT the target needs to be compiled with DFSan+SanitizerCoverage and linked with a special driver. The exact details are here.

Then the DFT needs to be collected for the entire seed corpus (see the example below). This will create a new directory with the DFT, which then needs to be compressed and stored on the network disk.

Using the DFT

The libFuzzer runners will use the DFT with some probability. If DFT is chosen for a particular run, the DFT directory is downloaded from the network disk and uncompressed on the local runner. Note that the DFT from the previous fuzzing iteration remains mostly usable, and so we do not need to synchronize the DFT collection and the use.

libFuzzer will need to be run with two extra flags (other flags are as usual):

-data_flow_trace=<DFT_DIR>: this simply instructs libFuzzer to load the DFT from <DFT_DIR>.
-focus_function=auto: this instructs libFuzzer to choose a focus function based on the DFT.

Alternatively, DFT could be collected by libFuzzer on the fly with -collect_data_flow=./dft-binary -fork=1, see below.

Example

This command sequence shows how to apply DFT-based fuzzing to the OnlySomeBytesTest.cpp puzzle.

#!/bin/bash
LLVM=$HOME/llvm-project
RT=$LLVM/compiler-rt
# Build the regular fuzzer binary.
clang -g -O0 -fsanitize=fuzzer $RT/test/fuzzer/OnlySomeBytesTest.cpp -o fuzzer-lf
# Build the DFT binary.
clang -c  -fsanitize=dataflow $RT/lib/fuzzer/dataflow/DataFlow.cpp
clang -c -fPIC $RT/lib/fuzzer/dataflow/DataFlowCallbacks.cpp
clang -g -fsanitize=dataflow -fsanitize-coverage=trace-pc-guard,pc-table,bb,trace-cmp  \
    $RT/test/fuzzer/OnlySomeBytesTest.cpp DataFlow*.o -o fuzzer-dft

# create the corpus
rm -rf CORPUS && mkdir CORPUS
(echo -n ABC; for((i=0;i<4093;i++)) ; do echo -n x; done) > CORPUS/seed
./fuzzer-lf CORPUS/ -use_value_profile=1 -runs=1000000 # Very unlikely to find the bug.

# create_dft()
rm -rf DFT && ./fuzzer-lf -collect_data_flow=./fuzzer-dft -data_flow_trace=DFT CORPUS

# Use DFT. This should find the bug almost instantly.
rm -rf C2; mkdir C2
./fuzzer-lf C2 CORPUS/ -use_value_profile=1 -data_flow_trace=DFT \
  -focus_function=auto -jobs=20 -artifact_prefix=C2/

# Or, much simpler with fork mode which will collect DFT itself:
./fuzzer-lf -use_value_profile=1 -collect_data_flow=./fuzzer-dft -fork=1

Dor1s commented 5 years ago

Do you have any plans to allow specifying more than 1 focus function?

kcc commented 5 years ago

No such plans yet, I want to polish the simplest workflow first. Besides, I am not sure if that will make any sense, after all if you have two things to focus on, you don't have a focus.

Dor1s commented 5 years ago

Inspired by AUTOGRAM, I've realized that we could try generating protobuf descriptions based on DFSan traces.

Dor1s commented 5 years ago

Temporary assigning to myself to do a very quick evaluation.

Dor1s commented 5 years ago

If anyone wants to play locally:

1) Wait till https://reviews.llvm.org/rL208268 lands 2) Check out #2292 locally or wait until it lands too 3) Build stuff (if #2292 lands, you can do python infra/helper.py pull_images instead of re-building base images locally):

$ project=zlib  # or anything else, preferable small and written in C
$ python infra/helper.py build_image --no-pull base-clang \
    && python infra/helper.py build_image --no-pull base-builder \
    && python infra/helper.py build_image --no-pull $project \
    && python infra/helper.py build_fuzzers --engine dataflow --sanitizer dataflow $project

Dor1s commented 5 years ago

50 projects succeeded to build:

gs://clusterfuzz-builds-dataflow/aosp/aosp-dataflow-201904091507.zip
gs://clusterfuzz-builds-dataflow/brotli/brotli-dataflow-201904091507.zip
gs://clusterfuzz-builds-dataflow/bzip2/bzip2-dataflow-201904091507.zip
gs://clusterfuzz-builds-dataflow/c-ares/c-ares-dataflow-201904091507.zip
gs://clusterfuzz-builds-dataflow/capstone/capstone-dataflow-201904091507.zip
gs://clusterfuzz-builds-dataflow/cmark/cmark-dataflow-201904091508.zip
gs://clusterfuzz-builds-dataflow/fuzzing-puzzles/fuzzing-puzzles-dataflow-201904091508.zip
gs://clusterfuzz-builds-dataflow/giflib/giflib-dataflow-201904091508.zip
gs://clusterfuzz-builds-dataflow/harfbuzz/harfbuzz-dataflow-201904091508.zip
gs://clusterfuzz-builds-dataflow/hoextdown/hoextdown-dataflow-201904091508.zip
gs://clusterfuzz-builds-dataflow/json-c/json-c-dataflow-201904091509.zip
gs://clusterfuzz-builds-dataflow/lcms/lcms-dataflow-201904091509.zip
gs://clusterfuzz-builds-dataflow/libchewing/libchewing-dataflow-201904091509.zip
gs://clusterfuzz-builds-dataflow/libexif/libexif-dataflow-201904091509.zip
gs://clusterfuzz-builds-dataflow/libgit2/libgit2-dataflow-201904091509.zip
gs://clusterfuzz-builds-dataflow/libidn2/libidn2-dataflow-201904091509.zip
gs://clusterfuzz-builds-dataflow/libldac/libldac-dataflow-201904091510.zip
gs://clusterfuzz-builds-dataflow/libpcap/libpcap-dataflow-201904091510.zip
gs://clusterfuzz-builds-dataflow/libplist/libplist-dataflow-201904091510.zip
gs://clusterfuzz-builds-dataflow/libteken/libteken-dataflow-201904091510.zip
gs://clusterfuzz-builds-dataflow/libtsm/libtsm-dataflow-201904091510.zip
gs://clusterfuzz-builds-dataflow/libwebp/libwebp-dataflow-201904091510.zip
gs://clusterfuzz-builds-dataflow/libyaml/libyaml-dataflow-201904091510.zip
gs://clusterfuzz-builds-dataflow/lzo/lzo-dataflow-201904091511.zip
gs://clusterfuzz-builds-dataflow/mbedtls/mbedtls-dataflow-201904091511.zip
gs://clusterfuzz-builds-dataflow/minizip/minizip-dataflow-201904091511.zip
gs://clusterfuzz-builds-dataflow/mupdf/mupdf-dataflow-201904091511.zip
gs://clusterfuzz-builds-dataflow/nestegg/nestegg-dataflow-201904091511.zip
gs://clusterfuzz-builds-dataflow/nghttp2/nghttp2-dataflow-201904091511.zip
gs://clusterfuzz-builds-dataflow/openjpeg/openjpeg-dataflow-201904091511.zip
gs://clusterfuzz-builds-dataflow/openthread/openthread-dataflow-201904091511.zip
gs://clusterfuzz-builds-dataflow/openvswitch/openvswitch-dataflow-201904091511.zip
gs://clusterfuzz-builds-dataflow/opus/opus-dataflow-201904091511.zip
gs://clusterfuzz-builds-dataflow/pcre2/pcre2-dataflow-201904091511.zip
gs://clusterfuzz-builds-dataflow/pffft/pffft-dataflow-201904091511.zip
gs://clusterfuzz-builds-dataflow/qcms/qcms-dataflow-201904091512.zip
gs://clusterfuzz-builds-dataflow/qpid-proton/qpid-proton-dataflow-201904091512.zip
gs://clusterfuzz-builds-dataflow/qubes-os/qubes-os-dataflow-201904091512.zip
gs://clusterfuzz-builds-dataflow/radare2/radare2-dataflow-201904091512.zip
gs://clusterfuzz-builds-dataflow/tpm2-tss/tpm2-tss-dataflow-201904091512.zip
gs://clusterfuzz-builds-dataflow/unicorn/unicorn-dataflow-201904091512.zip
gs://clusterfuzz-builds-dataflow/vorbis/vorbis-dataflow-201904091512.zip
gs://clusterfuzz-builds-dataflow/wolfssl/wolfssl-dataflow-201904091512.zip
gs://clusterfuzz-builds-dataflow/wuffs/wuffs-dataflow-201904091512.zip
gs://clusterfuzz-builds-dataflow/xz/xz-dataflow-201904091513.zip
gs://clusterfuzz-builds-dataflow/yajl-ruby/yajl-ruby-dataflow-201904091513.zip
gs://clusterfuzz-builds-dataflow/yara/yara-dataflow-201904091513.zip
gs://clusterfuzz-builds-dataflow/zlib-ng/zlib-ng-dataflow-201904091513.zip
gs://clusterfuzz-builds-dataflow/zlib/zlib-dataflow-201904091513.zip
gs://clusterfuzz-builds-dataflow/zstd/zstd-dataflow-201904091513.zip

Dor1s commented 5 years ago

Hey @kcc, could you please check the log attached. The issue I'm seeing with the second target I'm testing is that the script quickly runs into ==59901==FATAL: DataFlowSanitizer: out of labels error (in this case after running first 16 inputs) and then it keeps trying the same input again and again with no luck. Am I doing anything wrong?

block_decompress.log

Dor1s commented 5 years ago

If you need to reproduce:

1) Download gs://clusterfuzz-builds-dataflow/zstd/zstd-dataflow-201904091513.zip 2) Download gs://zstd-backup.clusterfuzz-external.appspot.com/corpus/libFuzzer/zstd_block_decompress/latest.zip 3) Unpack both, run block_decompress target

Dor1s commented 5 years ago

Ah, I guess the real root cause is that some inputs are too long. What would be a good threshold to trim / ignore long ones?

kcc commented 5 years ago

DFSan supports ~ 2^16 labels, but I would put a much lower threshold, e.g. 2^14 bytes for now. We can extend later at the cost of some (small) extra complexity. (I'll double-check what exactly is going on a bit later)

Dor1s commented 5 years ago

I'm gonna try skipping such inputs in the script instead of retrying. That should make life much easier and all changes will live in LLVM repo (i.e. no hacky corpus trimming on user end).

Dor1s commented 5 years ago

Yeah, https://reviews.llvm.org/D60538 seems to be a reasonable workaround for now.

Dor1s commented 5 years ago

And now libFuzzer is crashing with the following stacktrace (looks like it tries to mutate an empty input, though there aren't empty inputs in the corpus):

asan_block_decompress: /src/libfuzzer/FuzzerMutate.cpp:510: size_t fuzzer::MutationDispatcher::MutateImpl(uint8_t *, size_t, size_t, Vector<fuzzer::MutationDispatcher::Mutator> &): Assertion `MaxSize > 0' failed.
==81393== ERROR: libFuzzer: deadly signal
    #0 0x4c0171 in __sanitizer_print_stack_trace /src/llvm/projects/compiler-rt/lib/asan/asan_stack.cc:86
    #1 0x69ecdd in fuzzer::PrintStackTrace() /src/libfuzzer/FuzzerUtil.cpp:205:5
    #2 0x652e5e in fuzzer::Fuzzer::CrashCallback() /src/libfuzzer/FuzzerLoop.cpp:234:3
    #3 0x7f2aee76a0bf  (/lib/x86_64-linux-gnu/libpthread.so.0+0x110bf)
    #4 0x7f2aeddc8fce in gsignal (/lib/x86_64-linux-gnu/libc.so.6+0x32fce)
    #5 0x7f2aeddca3f9 in abort (/lib/x86_64-linux-gnu/libc.so.6+0x343f9)
    #6 0x7f2aeddc1e36  (/lib/x86_64-linux-gnu/libc.so.6+0x2be36)
    #7 0x7f2aeddc1ee1 in __assert_fail (/lib/x86_64-linux-gnu/libc.so.6+0x2bee1)
    #8 0x68d567 in fuzzer::MutationDispatcher::MutateImpl(unsigned char*, unsigned long, unsigned long, std::__1::vector<fuzzer::MutationDispatcher::Mutator, fuzzer::fuzzer_allocator<fuzzer::MutationDispatcher::Mutator> >&) /src/libfuzzer/FuzzerMutate.cpp:510:3
    #9 0x68d92a in Mutate /src/libfuzzer/FuzzerMutate.cpp:498:10
    #10 0x68d92a in fuzzer::MutationDispatcher::MutateWithMask(unsigned char*, unsigned long, unsigned long, std::__1::vector<unsigned char, fuzzer::fuzzer_allocator<unsigned char> > const&) /src/libfuzzer/FuzzerMutate.cpp:546
    #11 0x658b33 in fuzzer::Fuzzer::MutateAndTestOne() /src/libfuzzer/FuzzerLoop.cpp:659:20
    #12 0x65bea8 in fuzzer::Fuzzer::Loop(std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, fuzzer::fuzzer_allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, fuzzer::fuzzer_allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&) /src/libfuzzer/FuzzerLoop.cpp:814:5
    #13 0x6207b1 in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) /src/libfuzzer/FuzzerDriver.cpp:776:6
    #14 0x6131a7 in main /src/libfuzzer/FuzzerMain.cpp:19:10
    #15 0x7f2aeddb62b0 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x202b0)
    #16 0x41d8e8 in _start (/usr/local/google/home/mmoroz/Downloads/df/asan_block_decompress+0x41d8e8)

NOTE: libFuzzer has rudimentary signal handlers.
      Combine libFuzzer with AddressSanitizer or similar for better crash reports.
SUMMARY: libFuzzer: deadly signal
MS: 0 ; base unit: 2a4e35072f5775415da786443a900821629e6c60

kcc commented 5 years ago

Could be a trivial bug... Can you try this?

Index: FuzzerMutate.cpp
===================================================================
--- FuzzerMutate.cpp    (revision 358040)
+++ FuzzerMutate.cpp    (working copy)
@@ -542,6 +542,7 @@
     if (Mask[I])
       T[OneBits++] = Data[I];

+  if (!OneBits) return 0;
   assert(!T.empty());
   size_t NewSize = Mutate(T.data(), OneBits, OneBits);
   assert(NewSize <= OneBits);

Dor1s commented 5 years ago

Thanks, @kcc! It helped with one more change, I've uploaded both in https://reviews.llvm.org/D60567

However, now I'm getting another crash (looks like the Mask is shorter than the input somehow):

supernew_asan_block_decompress: /src/libfuzzer/FuzzerMutate.cpp:532: size_t fuzzer::MutationDispatcher::MutateWithMask(uint8_t *, size_t, size_t, const Vector<uint8_t> &): Assertion `Size <= Mask.size()' failed.
==3743== ERROR: libFuzzer: deadly signal
    #0 0x4c0171 in __sanitizer_print_stack_trace /src/llvm/projects/compiler-rt/lib/asan/asan_stack.cc:86
    #1 0x69eccd in fuzzer::PrintStackTrace() /src/libfuzzer/FuzzerUtil.cpp:205:5
    #2 0x652e5e in fuzzer::Fuzzer::CrashCallback() /src/libfuzzer/FuzzerLoop.cpp:234:3
    #3 0x7f29962790bf  (/lib/x86_64-linux-gnu/libpthread.so.0+0x110bf)
    #4 0x7f29958d7fce in gsignal (/lib/x86_64-linux-gnu/libc.so.6+0x32fce)
    #5 0x7f29958d93f9 in abort (/lib/x86_64-linux-gnu/libc.so.6+0x343f9)
    #6 0x7f29958d0e36  (/lib/x86_64-linux-gnu/libc.so.6+0x2be36)
    #7 0x7f29958d0ee1 in __assert_fail (/lib/x86_64-linux-gnu/libc.so.6+0x2bee1)
    #8 0x68dc5d in fuzzer::MutationDispatcher::MutateWithMask(unsigned char*, unsigned long, unsigned long, std::__1::vector<unsigned char, fuzzer::fuzzer_allocator<unsigned char> > const&) /src/libfuzzer/FuzzerMutate.cpp:532:3
    #9 0x658b32 in fuzzer::Fuzzer::MutateAndTestOne() /src/libfuzzer/FuzzerLoop.cpp:659:20
    #10 0x65beb8 in fuzzer::Fuzzer::Loop(std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, fuzzer::fuzzer_allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, fuzzer::fuzzer_allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&) /src/libfuzzer/FuzzerLoop.cpp:816:5
    #11 0x6207b1 in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) /src/libfuzzer/FuzzerDriver.cpp:776:6
    #12 0x6131a7 in main /src/libfuzzer/FuzzerMain.cpp:19:10
    #13 0x7f29958c52b0 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x202b0)
    #14 0x41d8e8 in _start (/usr/local/google/home/mmoroz/projects/df/supernew_asan_block_decompress+0x41d8e8)

NOTE: libFuzzer has rudimentary signal handlers.
      Combine libFuzzer with AddressSanitizer or similar for better crash reports.
SUMMARY: libFuzzer: deadly signal
MS: 1 CMP- DE: "\x00\xc0\x00\x00\x00\x00\x00\x00"-; base unit: 57fb087984f90ba1677e27e934fb1ec989850df4

Corpus unit is 71 bytes and trace is 1867 bytes long:

$ ls -l block_decompress_corpus/57fb087984f90ba1677e27e934fb1ec989850df4 
-rw-r--r-- 1 mmoroz 71   Apr  9 19:14 block_decompress_corpus/57fb087984f90ba1677e27e934fb1ec989850df4
$ ls -l block_decompress_dft/57fb087984f90ba1677e27e934fb1ec989850df4 
-rw-r--r-- 1 mmoroz 1867 Apr 10 13:30 block_decompress_dft/57fb087984f90ba1677e27e934fb1ec989850df4

Dor1s commented 5 years ago

Thanks Kostya for explaining some of the things in more detail. With one more change (https://reviews.llvm.org/D60571) I've got that fuzz target running locally!

Dor1s commented 5 years ago

See below the difference in the disk space used by DataFlow traces vs corpus. Some targets are missing and some might be not fully correct as I ran out of disk space, but IMO it's safe to conclude 2-10x difference for most of the cases.:

   Ratio     Corpus size     DFT size      ./project/fuzz_target_name
----------------------------------------------------------------------------------------
    2.98x    from 158M   to 471M       for ./capstone/fuzz_disasmv4
    2.07x    from 202M   to 418M       for ./capstone/fuzz_disasmnext
    2.19x    from 43M    to 94M        for ./vorbis/decode_fuzzer
    3.25x    from 24M    to 78M        for ./wolfssl/pem_cert
    2.63x    from 19M    to 50M        for ./lcms/cmsIT8_load_fuzzer
    1.11x    from 117M   to 130M       for ./unicorn/fuzz_emu_x86_32
    1.37x    from 106M   to 145M       for ./unicorn/fuzz_emu_mips_32le
    2.30x    from 133M   to 306M       for ./unicorn/fuzz_emu_x86_64
    1.39x    from 114M   to 159M       for ./unicorn/fuzz_emu_mips_32be
    2.09x    from 43M    to 90M        for ./unicorn/fuzz_emu_arm_arm
    2.24x    from 33M    to 74M        for ./libexif/exif_loader_fuzzer
    9.39x    from 120M   to 1.1G       for ./aosp/sqlite
    2.00x    from 4.0K   to 8.0K       for ./hoextdown/hoedown_fuzzer
    5.39x    from 399M   to 2.1G       for ./mupdf/pdf_fuzzer
    5.06x    from 65M    to 329M       for ./qpid-proton/fuzz-message-decode
    1.07x    from 134M   to 143M       for ./openjpeg/opj_decompress_fuzzer
    3.50x    from 20M    to 70M        for ./mbedtls/fuzz_x509crl
    4.58x    from 36M    to 165M       for ./mbedtls/fuzz_dtlsserver
   14.57x    from 14M    to 204M       for ./mbedtls/fuzz_x509csr
    6.47x    from 19M    to 123M       for ./mbedtls/fuzz_pubkey
    2.27x    from 11M    to 25M        for ./mbedtls/fuzz_privkey
    3.00x    from 4.0K   to 12K        for ./fuzzing-puzzles/multiple_constraints_on_small_input_afl_fuzzer
   16.93x    from 127M   to 2.1G       for ./radare2/ia_fuzz
    1.71x    from 17M    to 29M        for ./opus/opus_decode_fuzzer_fixed
    1.68x    from 19M    to 32M        for ./opus/opus_decode_fuzzer_floating
    1.33x    from 12K    to 16K        for ./zlib/example_flush_fuzzer
    1.03x    from 33M    to 34M        for ./zlib/example_dict_fuzzer
    1.55x    from 53M    to 82M        for ./yara/rules_fuzzer
 2560.00x    from 8.0K   to 20M        for ./yara/macho_fuzzer
   22.08x    from 25M    to 552M       for ./openthread/radio-receive-done-fuzzer
    1.43x    from 21M    to 30M        for ./openthread/ip6-send-fuzzer
    4.69x    from 68M    to 319M       for ./openthread/cli-uart-received-fuzzer
    3.04x    from 4.6M   to 14M        for ./c-ares/ares_create_query_fuzzer
    2.03x    from 5.9M   to 12M        for ./libidn2/libidn2_to_unicode_8z8z_fuzzer
    1.22x    from 49M    to 60M        for ./libpcap/fuzz_both
    3.00x    from 4.0K   to 12K        for ./libchewing/chewing_default_fuzzer
    3.00x    from 4.0K   to 12K        for ./libchewing/chewing_random_init_fuzzer
    3.00x    from 4.0K   to 12K        for ./libchewing/chewing_dynamic_config_fuzzer
    1.26x    from 50M    to 63M        for ./libwebp/fuzz_simple_api
    2.80x    from 20M    to 56M        for ./libwebp/fuzz_webp_enc_dec
    1.78x    from 23M    to 41M        for ./libwebp/fuzz_webp_animencoder
    1.56x    from 390M   to 608M       for ./harfbuzz/hb-shape-fuzzer
    4.29x    from 152M   to 652M       for ./harfbuzz/hb-subset-fuzzer
    2.97x    from 36M    to 107M       for ./nghttp2/nghttp2_fuzzer
    1.63x    from 263M   to 430M       for ./cmark/cmark_fuzzer
62976.00x    from 4.0K   to 246M       for ./zlib-ng/compress_fuzzer
    1.39x    from 46M    to 64M        for ./zlib-ng/example_dict_fuzzer
    1.88x    from 32M    to 60M        for ./zstd/simple_decompress
    2.21x    from 169M   to 373M       for ./zstd/stream_round_trip
    5.90x    from 29M    to 171M       for ./zstd/stream_decompress
    1.29x    from 17M    to 22M        for ./zstd/block_decompress
    2.04x    from 78M    to 159M       for ./zstd/block_round_trip

kcc commented 5 years ago

Nice, thanks! I didn't even try to optimize the disk size yet, wanted to see if the logic works at all. I think the easiest way to optimize the disk space is to zlib-compress the data.

kcc commented 5 years ago

Just added -focus_function=auto which will make libFuzzer choose the focus function automatically based on the coverage data contained in the trace files.

So far tested only on a tiny test.

I will keep testing and tuning it, but the basic functionality is there.

kcc commented 5 years ago

I've reimplemented the python scripts in libFuzzer proper (LLVM r360712).

The current work flow:

#!/bin/bash
LLVM=$HOME/llvm
RT=$LLVM/projects/compiler-rt
# Build the regular fuzzer binary.
clang -g -O1 -fsanitize=fuzzer $RT/test/fuzzer/OnlySomeBytesTest.cpp -o fuzzer-lf
# Build the DFT binary.
clang -c  -fsanitize=dataflow $RT/lib/fuzzer/dataflow/DataFlow.cpp
clang -g -fsanitize=dataflow -fsanitize-coverage=trace-pc-guard,pc-table,bb,trace-cmp  \
    $RT/test/fuzzer/OnlySomeBytesTest.cpp DataFlow.o -o fuzzer-dft

# create the corpus
rm -rf CORPUS && mkdir CORPUS
(echo -n ABC; for((i=0;i<4093;i++)) ; do echo -n x; done) > CORPUS/seed
./fuzzer-lf CORPUS/ -use_value_profile=1 -runs=1000000 # Very unlikely to find the bug.

# create_dft()
rm -rf DFT && ./fuzzer-lf -collect_data_flow=./fuzzer-dft -data_flow_trace=DFT CORPUS

# Use DFT. This should find the bug almost instantly.
rm -rf C2; mkdir C2
./fuzzer-lf C2 CORPUS/ -use_value_profile=1 -data_flow_trace=DFT \
  -focus_function=auto -jobs=20 -artifact_prefix=C2/```

I have not tested this on anything real yet, only on the above synthetic puzzle.

Dor1s commented 5 years ago

I've tried to build all the projects once again (in order to have a better sampling of the builds and choose only stable ones for the experiment), and this time only 4 project builds succeeded:

$ gsutil ls -r gs://clusterfuzz-builds-dataflow/ | egrep 20190517
gs://clusterfuzz-builds-dataflow/c-ares/c-ares-dataflow-201905171213.srcmap.json
gs://clusterfuzz-builds-dataflow/c-ares/c-ares-dataflow-201905171213.zip
gs://clusterfuzz-builds-dataflow/radare2/radare2-dataflow-201905171217.srcmap.json
gs://clusterfuzz-builds-dataflow/radare2/radare2-dataflow-201905171217.zip
gs://clusterfuzz-builds-dataflow/zlib-ng/zlib-ng-dataflow-201905171217.srcmap.json
gs://clusterfuzz-builds-dataflow/zlib-ng/zlib-ng-dataflow-201905171217.zip
gs://clusterfuzz-builds-dataflow/zlib/zlib-dataflow-201905171217.srcmap.json
gs://clusterfuzz-builds-dataflow/zlib/zlib-dataflow-201905171217.zip

Checking the logs... Maybe recent migration broke the others.

Dor1s commented 5 years ago

Yeah, with a newer version of #2303 I'm able to build many projects again.

Dor1s commented 5 years ago

41 projects which we should try DFT-based fuzzing on:

aosp
brotli
bzip2
capstone
cmark
giflib
harfbuzz
hoextdown
lcms
libchewing
libexif
libgit2
libidn2
libldac
libpcap
libplist
libteken
libtsm
libwebp
libyaml
lzo
mbedtls
minizip
mupdf
nestegg
nghttp2
openjpeg
openthread
openvswitch
opus
pcre2
pffft
qcms
radare2
vorbis
wolfssl
wuffs
xz
yara
zlib
zstd

kcc commented 5 years ago

New workflow:

build the two binaries as above

run the libFuzzer binary with -fork=N and -collect_data_flow=

./fuzzer-lf -use_value_profile=1 -collect_data_flow=./fuzzer-dft -fork=20

(again, not tested yet outside of tiny examples)

With this workflow we may not need any kind of DFT management from ClusterFuzz -- just let CF ship both binary (libFuzzer and DFT) to a worker and invoke the fuzzer binary with -collect_data_flow=./DFT -fork=1

Dor1s commented 5 years ago

Ack. I've just got a null deref in libFuzzer locally, but I think it has something to do with the way things are getting built now (i.e. -fsanitize=fuzzer uses LLVM that is more than a week old and doesn't have your DFT changes):

$ asan/zlib_uncompress_fuzzer -use_value_profile=1 -collect_data_flow=dfsan/zlib_uncompress_fuzzer -print_final_stats=1 -max_total_time=3600 -timeout=25 corpus/df_new corpus/new/ corpus/cf/                                                                                      
INFO: Seed: 2102970602
AddressSanitizer:DEADLYSIGNAL
=================================================================                                       
==228104==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x7f0ba6dbd646 bp 0x7ffff27e9370 sp 0x7ffff27e8b28 T0)
==228104==The signal is caused by a READ memory access.                                                 
==228104==Hint: address points to the zero page.
    #0 0x7f0ba6dbd645 in strlen (/lib/x86_64-linux-gnu/libc.so.6+0x80645)                               
    #1 0x4d2758 in __interceptor_strlen /src/llvm/projects/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc
    #2 0x46364b in length /work/llvm-stage2/projects/compiler-rt/lib/fuzzer/libcxx_fuzzer_x86_64/include/c++/v1/__string:217:53
    #3 0x46364b in basic_string /work/llvm-stage2/projects/compiler-rt/lib/fuzzer/libcxx_fuzzer_x86_64/include/c++/v1/string:821
    #4 0x46364b in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) /src/llvm/projects/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:718                                              
    #5 0x48bad2 in main /src/llvm/projects/compiler-rt/lib/fuzzer/FuzzerMain.cpp:19:10                  
    #6 0x7f0ba6d5d2b0 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x202b0)                    
    #7 0x41dad8 in _start (/usr/local/google/home/mmoroz/projects/dataflow/zlib/asan/zlib_uncompress_fuzzer+0x41dad8)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV (/lib/x86_64-linux-gnu/libc.so.6+0x80645) in strlen                     
==228104==ABORTING

Need to think of a convenient way of using a ToT. @jonathanmetzman do you have any suggestions?

I obviously can bump and force LLVM revision locally, and avoid pulling the images from OSS-Fuzz, but that seems a bit too heavyweight.

jonathanmetzman commented 5 years ago

Maybe change $LIB_FUZZING_ENGINE to /path/tolibFuzzingEngine.a in dataflow sanitizer builds?

Dor1s commented 5 years ago

In dataflow builds LIB_FUZZING_ENGINE is pointing to DataFlow.o -- it doesn't use libFuzzer. I need to hack --engine libfuzzer build. Others may need to do it as well from time to time (e.g. you or an intern testing something new, Matt fixing something upstream, etc), so I'm thinking maybe we should add some extra flag or libfuzzer-tot engine option.

Dor1s commented 5 years ago

For now, bumped LLVM to r361579 locally. The crash reproduced anyway, probably because I didn't use -fork= mode:

  if (Flags.collect_data_flow && !Flags.fork && !Flags.merge) {
    if (RunIndividualFiles)
      return CollectDataFlow(Flags.collect_data_flow, Flags.data_flow_trace,
                        ReadCorpora({}, *Inputs));
    else
      return CollectDataFlow(Flags.collect_data_flow, Flags.data_flow_trace,  // :720, crash here
                        ReadCorpora(*Inputs, {}));
  }

stacktrace isn't super helpful, CC @kcc maybe you can quickly realize what's wrong:

    #0 0x7fc523c6a645 in strlen (/lib/x86_64-linux-gnu/libc.so.6+0x80645)
    #1 0x4d3548 in __interceptor_strlen /src/llvm/projects/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc
    #2 0x463f6d in length /work/llvm-stage2/projects/compiler-rt/lib/fuzzer/libcxx_fuzzer_x86_64/include/c++/v1/__string:217:53
    #3 0x463f6d in basic_string /work/llvm-stage2/projects/compiler-rt/lib/fuzzer/libcxx_fuzzer_x86_64/include/c++/v1/string:821
    #4 0x463f6d in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) /src/llvm/projects/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:720
    #5 0x48c8c2 in main /src/llvm/projects/compiler-rt/lib/fuzzer/FuzzerMain.cpp:19:10
    #6 0x7fc523c0a2b0 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x202b0)
    #7 0x41db28 in _start (/usr/local/google/home/mmoroz/projects/dataflow/zlib/asan/zlib_uncompress_fuzzer+0x41db28)

With fork mode enabled it seems to be running, so I'll report back on the progress.

Dor1s commented 5 years ago

zlib_uncompress_fuzzer, -fork=1, 1 hour:

$ asan/zlib_uncompress_fuzzer -use_value_profile=1 -collect_data_flow=dfsan/zlib_uncompress_fuzzer -print_final_stats=1 -max_total_time=3600 -fork=1 -timeout=25 corpus/df_
new corpus/new/ corpus/cf/                                                                              
INFO: Seed: 1493609206
INFO: Loaded 1 modules   (664 inline 8-bit counters): 664 [0x7e5af0, 0x7e5d88),                         
INFO: Loaded 1 PC tables (664 PCs): 664 [0x5a0878,0x5a31f8),                                            
INFO: -fork=1: fuzzing in separate process(s)
INFO: -fork=1: 1428 seed inputs, starting to fuzz in /tmp/libFuzzerTemp.59644.dir                       
INFO: fuzzed for 3658 seconds, wrapping up soon
INFO: exiting: 0 time: 3658s

The output directory is empty, and I don't have any logs to look at. I've been checking temp logs occasionally to make sure things were running.

I know that my setup works though, because when I tried running over a corpus subset, I've got new units written. The output in that case looks similar to the output of a regular -fork mode run, so I think we shouldn't miss any stats.

Dor1s commented 5 years ago

@kcc, another question for you: how do I see how much time is spent on collecting DFT? I'm just worried that if I enable it in current CF configuration, we'll be fuzzing up to ~44 minutes each run, and I don't want to spend too much time on collecting the traces.

Dor1s commented 5 years ago

And one more question for @kcc (let me summarize all of them in this comment):

1) (not a blocked) see null deref in https://github.com/google/oss-fuzz/issues/1632#issuecomment-496986494

2) How much time is spent of DFT collection? Should we log/track it?

3) If two binaries for the same fuzz target are built using different source versions, would nothing break? E.g. DFT would report functions that are not present in the fuzzing binary.

Dor1s commented 5 years ago

I've been running zlib_uncompress_fuzzer for ~2 hours with -fork=32, and it discovered one new feature:

$ asan/zlib_uncompress_fuzzer -use_value_profile=1 -collect_data_f
low=dfsan/zlib_uncompress_fuzzer -print_final_stats=1 -max_total_time=7000 -fork=32 -timeout=25 corpus/ne
w_fork_30/ corpus/new/ corpus/cf/                                                                       
INFO: Seed: 1037078266                                                                                  
INFO: Loaded 1 modules   (664 inline 8-bit counters): 664 [0x7e5af0, 0x7e5d88),                         
INFO: Loaded 1 PC tables (664 PCs): 664 [0x5a0878,0x5a31f8),                                             
INFO: -fork=32: fuzzing in separate process(s)                                                          
INFO: -fork=32: 1428 seed inputs, starting to fuzz in /tmp/libFuzzerTemp.124663.dir  
#904235017: cov: 302 ft: 3538 corp: 1429 exec/s 3776 oom/timeout/crash: 0/0/0 time: 6684s                
INFO: fuzzed for 7003 seconds, wrapping up soon                                                         
INFO: exiting: 0 time: 7295s

Despite being a simple sanity check, it might be even impressive, given that we don't discover new features in this target too often -- we have days with 0 new features discovered, it's heavily saturated.

kcc commented 5 years ago

Just a heads up: I've used libfdk-aac as a guinea pig for DFT fuzzing and the scalability is very poor, mostly because DFSan runs out of tags too often and DFT collection is very slow. So, do not expect that the current implementation will scale well on every project (it may scale on some, e.g. on sqlite3 it seems to work fine).

I have some thoughts about improving dfsan for this use case, but it will take time.

Dor1s commented 5 years ago

I've checked sizes of DFSan binaries. Looks like they can be 2-3 times smaller than ASan binaries, but for some targets they can take almost up to ASan binary size. So, I'd just assume that DFSan binaries are not bigger than ASan ones.

I've noticed that seed corpus takes quite a bit of space. Removing the seed corpus from DFSan build gives some good savings. Maybe we should do it for other non-ASan builds too.

 Project | ASan unpacked | DFSan unpacked | ASan packed | DFSan packed
----------------------------------------------------------------------
zstd     | 203 MB        | 36 MB          | 163 MB      | 36 MB
openjpeg | 34 MB         | 2.4 MB         | 28 MB       | 1 MB
mupdf    | 107 MB        | 61 MB          | 80 MB       | 36 MB

Dor1s commented 5 years ago

In the other relevant news, clang has been rolled yesterday so I'm good to kick off some builds now.

kcc commented 5 years ago

I've fixed several scalability and functionality issues. Also had to split Dataflow.cpp into two files, so the build rules changed slightly. The top-level comment updated to reflect the changes.

I am playing with two benchmarks: sqlite3 and libfdk-aac. In both the results give basis for optimism, but are still inconclusive.

Next step is to run a larger-scale A/B test. Not sure when I can get to it.

kcc commented 5 years ago

DFT collection remains pretty slow. Now it requires InputSize/16 executions per every input (in the same process). There is a possibility to speed up dfsan further, perhaps by 2x, but it won't change the situation: most likely the DFT will need to be preserved between the CF workers to avoid expensive re-computation.

Dor1s commented 5 years ago

Thanks for the heads up! The ClusterFuzz side changes are in review, I need to re-write some things and then will be good to go.

Regarding preserving and re-using DFT, I'm now thinking that we might try to generate those during the build, rather than implementing yet another job like corpus pruning, we'll see.

Dor1s commented 4 years ago

So far we have observed the following issues: 1) DFSan tags explosion 2) Lots of time spent on collecting the traces before fuzzing 3) Long and slow inputs are likely harmful

Solutions (not final, just a next iteration): 1) @kcc implemented a new tag union logic in DFSan that is guaranteed not to explode, but limits us to trace no more than 16 bytes at a time 2) Collect traces during the build time and put them into build archive 3) Set up thresholds for the inputs' size and time-to-process, skip traces for slow/long inputs

Longer term solutions (likely to be refined in future): 1) Think more, there is a room for improvement, but it's not worth doing unless we can scale and successfully use the current implementation 2) Collect traces in a separate task, similarly to the corpus pruning task we have on CF 3) Thresholds might be dynamic, or we may want not to have long/slow inputs at all in our corpus

I'll write up some more detail in the design doc. This is just an update / plan for the next steps.

Dor1s commented 4 years ago

Attached is the log of 19 consequent 1hr runs of libexif fuzzer with the traces generated using corpus backup from Dec 20th:

$ cat all_logs.log | egrep -o "seed corpus: files: [[:digit:]]+|[[:digit:]]+ traces with focus function|Focus function is set to '.*'|(INITED|DONE)\s+cov: [[:digit:]]+ ft: [[:digit:]]+|INFO: [[:digit:]]+/[[:digit:]]+ inputs .*"
117 traces with focus function
Focus function is set to 'match_repeated_char'
seed corpus: files: 5185
INITED cov: 884 ft: 11397
INFO: 114/4781 inputs touch the focus function
INFO: 114/4781 inputs have the Data Flow Trace
DONE   cov: 884 ft: 11433
364 traces with focus function
Focus function is set to 'mnote_pentax_tag_get_name'
seed corpus: files: 5219
INITED cov: 884 ft: 11433
INFO: 347/4810 inputs touch the focus function
INFO: 346/4810 inputs have the Data Flow Trace
DONE   cov: 884 ft: 11447
149 traces with focus function
Focus function is set to 'exif_entry_realloc'
seed corpus: files: 5257
INITED cov: 884 ft: 11447
INFO: 135/4825 inputs touch the focus function
INFO: 130/4825 inputs have the Data Flow Trace
DONE   cov: 884 ft: 11449
253 traces with focus function
Focus function is set to 'mnote_canon_tag_get_name'
seed corpus: files: 5360
INITED cov: 884 ft: 11449
INFO: 242/4828 inputs touch the focus function
INFO: 242/4828 inputs have the Data Flow Trace
DONE   cov: 884 ft: 11454
149 traces with focus function
Focus function is set to 'exif_entry_realloc'
seed corpus: files: 5387
INITED cov: 884 ft: 11454
INFO: 132/4828 inputs touch the focus function
INFO: 126/4828 inputs have the Data Flow Trace
DONE   cov: 884 ft: 11457
253 traces with focus function
Focus function is set to 'mnote_canon_tag_get_name'
seed corpus: files: 5433
INITED cov: 884 ft: 11457
INFO: 246/4831 inputs touch the focus function
INFO: 246/4831 inputs have the Data Flow Trace
DONE   cov: 884 ft: 11459
149 traces with focus function
Focus function is set to 'exif_entry_realloc'
seed corpus: files: 5464
INITED cov: 884 ft: 11459
INFO: 134/4839 inputs touch the focus function
INFO: 129/4839 inputs have the Data Flow Trace
DONE   cov: 884 ft: 11461
117 traces with focus function
Focus function is set to 'match_repeated_char'
seed corpus: files: 5489
INITED cov: 884 ft: 11461
INFO: 113/4833 inputs touch the focus function
INFO: 113/4833 inputs have the Data Flow Trace
DONE   cov: 884 ft: 11462
303 traces with focus function
Focus function is set to 'exif_mnote_data_fuji_load'
seed corpus: files: 5525
INITED cov: 884 ft: 11462
INFO: 301/4849 inputs touch the focus function
INFO: 300/4849 inputs have the Data Flow Trace
DONE   cov: 884 ft: 11464
149 traces with focus function
Focus function is set to 'exif_entry_realloc'
seed corpus: files: 5593
INITED cov: 884 ft: 11464
INFO: 130/4845 inputs touch the focus function
INFO: 125/4845 inputs have the Data Flow Trace
DONE   cov: 884 ft: 11468
640 traces with focus function
Focus function is set to 'exif_format_get_name'
seed corpus: files: 5616
INITED cov: 884 ft: 11468
INFO: 703/4852 inputs touch the focus function
INFO: 595/4852 inputs have the Data Flow Trace
DONE   cov: 884 ft: 11471
205 traces with focus function
Focus function is set to 'mnote_fuji_tag_get_name'
seed corpus: files: 5643
INITED cov: 884 ft: 11471
INFO: 204/4853 inputs touch the focus function
INFO: 204/4853 inputs have the Data Flow Trace
DONE   cov: 884 ft: 11471
149 traces with focus function
Focus function is set to 'exif_entry_realloc'
seed corpus: files: 5667
INITED cov: 884 ft: 11471
INFO: 139/4858 inputs touch the focus function
INFO: 131/4858 inputs have the Data Flow Trace
DONE   cov: 884 ft: 11472
3 traces with focus function
Focus function is set to 'exif_set_slong'
seed corpus: files: 5682
INITED cov: 884 ft: 11472
INFO: 2333/4852 inputs touch the focus function
INFO: 3/4852 inputs have the Data Flow Trace
DONE   cov: 884 ft: 11472
197 traces with focus function
Focus function is set to 'mnote_olympus_tag_get_name'
seed corpus: files: 5698
INITED cov: 884 ft: 11472
INFO: 192/4862 inputs touch the focus function
INFO: 191/4862 inputs have the Data Flow Trace
DONE   cov: 884 ft: 11472
149 traces with focus function
Focus function is set to 'exif_entry_realloc'
seed corpus: files: 5705
INITED cov: 884 ft: 11472
INFO: 138/4859 inputs touch the focus function
INFO: 130/4859 inputs have the Data Flow Trace
DONE   cov: 884 ft: 11473
711 traces with focus function
Focus function is set to 'exif_mnote_data_pentax_identify'
seed corpus: files: 5710
INITED cov: 884 ft: 11473
INFO: 601/4846 inputs touch the focus function
INFO: 591/4846 inputs have the Data Flow Trace
DONE   cov: 884 ft: 11475
378 traces with focus function
Focus function is set to 'exif_mnote_data_olympus_load'
seed corpus: files: 5733
INITED cov: 884 ft: 11475
INFO: 375/4850 inputs touch the focus function
INFO: 370/4850 inputs have the Data Flow Trace
DONE   cov: 884 ft: 11475
410 traces with focus function
Focus function is set to 'exif_mnote_data_pentax_load'
seed corpus: files: 5748
INITED cov: 884 ft: 11475
INFO: 375/4860 inputs touch the focus function
INFO: 364/4860 inputs have the Data Flow Trace
DONE   cov: 884 ft: 11475

all_logs.log

Dor1s commented 4 years ago

What's also interesting here that in 19 runs auto-focus used 12 different functions:

exif_entry_realloc- 6 times, coverage
match_repeated_char - 2 times, coverage
mnote_canon_tag_get_name - 2 times, coverage
exif_format_get_name - 1 time, coverage
exif_mnote_data_fuji_load - 1 time, coverage
exif_mnote_data_olympus_load - 1 time, coverage
exif_mnote_data_pentax_identify - 1 time, coverage
exif_mnote_data_pentax_load - 1 time, coverage
exif_set_slong - 1 time, coverage
mnote_fuji_tag_get_name - 1 time, coverage
mnote_olympus_tag_get_name - 1 time, coverage
mnote_pentax_tag_get_name - 1 time, coverage

From looking at the coverage reports as a human, I would disagree with some of the functions chosen, but I need to take a closer look into the coverage libFuzzer observed and reported. I'll post another comment on this later today.

Dor1s commented 4 years ago

I'll post another comment on this later today.

Sorry for being a liar here and not reporting back yesterday.

First of all, the current autofocus logic can choose any of the functions present in the binary, not even a fully covered one, just with a lower probability. Therefore, there is no value in making any judgement based on just 19 different choices.

Secondly, I've been confused for quite some time by BB-tot 1 BB-cov 1 in the output of every function. Didn't realize that proposed instrumentation was change to bb coverage instead of func (for quite a while: https://github.com/llvm/llvm-project/commit/219b2b3a4a7805060673459cb5652d6db510108a#diff-2ac0bbad489ac9394262c333faace2e1).

Uploading a quick fix now.

Dor1s commented 4 years ago

For those following along, https://github.com/google/oss-fuzz/pull/3238 has a version of DFT collector with some dynamic controls (to skip long and slow inputs), I've experimented with 4 different configurations in libexif:

# 5019 / 5185, 3m22s, FILE_SIZE_LIMIT = 16 * 1024, MIN_TIMEOUT = 0.5, TIMEOUT_RANGE = 2.0
total: 5185
traced: 5017
long: 152
slow: 16
failed: 0
# 5042 / 5185, 9m45s, FILE_SIZE_LIMIT = 64 * 1024, MIN_TIMEOUT = 0.5, TIMEOUT_RANGE = 4.5
total: 5185
traced: 5021
long: 21
slow: 143
failed: 0
# 5036 / 5185, 5m47s, FILE_SIZE_LIMIT = 32 * 1024, MIN_TIMEOUT = 0.5, TIMEOUT_RANGE = 3.5
total: 5185
traced: 5036
long: 96
slow: 53
failed: 0
# 5046 / 5185, 5m56s,  FILE_SIZE_LIMIT = 32 * 1024, MIN_TIMEOUT = 1, TIMEOUT_RANGE = 3
total: 5185
traced: 5046
long: 96
slow: 43
failed: 0

and decided to proceed with the last one, as it appears to be the most efficient from all of them. However, these parameters can be easily adjusted via env variables (i.e. no need to re-build base images) and therefore might be tweaked more as we keep experimenting.

Dor1s commented 4 years ago

I'm seeing the following issue in the logs:

b"INFO: DataFlowTrace: reading from '/mnt/scratch0/clusterfuzz/bot/builds/clusterfuzz-builds-dataflow_zstd_b763e94320cfaa693e68f682c997338da0e05170/dataflow/stream_decompress_dft'"
b'INFO: AUTOFOCUS: 1442 ZSTDv01_decompressDCtx'
b'INFO: DataFlowTrace: 10047 trace files, 1790 functions, 380 traces with focus function'
b'INFO: 0/6585 inputs touch the focus function'
b'INFO: 0/6585 inputs have the Data Flow Trace'

Somehow inputs in the corpus do not touch the focus function, and reportedly do not have corresponding data flow traces.

Dor1s commented 4 years ago

I definitely don't understand what's going on. I've downloaded exact same ASan build, DFSan build, copied the command from a log, and getting non-zero values for the inputs touching the focus function as well as the ones having the Data Flow Trace...

Dor1s commented 4 years ago

Well, I manage to rarely get 0 inputs locally too (depends on the particular focus function + max_len limitation), but I can't find any non zero case in ClusterFuzz, at least for zstd project.

Dor1s commented 4 years ago

I feel like there must be something wrong with the symbolization maybe, perhaps because of the minijail?

But why would this line successfully execute then: https://github.com/llvm/llvm-project/blob/1c8e05110c01254fc26ca3db90e9d8518957d815/compiler-rt/lib/fuzzer/FuzzerTracePC.cpp#L254

@kcc can you think of any external condition that would let the binary execute TracePC::SetFocusFunction, but would not let it to ever satisfy TracePC::ObservedFocusFunction ?

Dor1s commented 4 years ago

Things like https://pantheon.corp.google.com/storage/browser/_details/zstd-logs.clusterfuzz-external.appspot.com/libFuzzer_zstd_stream_decompress/libfuzzer_asan_zstd/2020-01-29/00:47:17:993855.log are also suspicious, as the focus function FSE_buildDTable_rle seems to have 4 entries in the list of the functions present.

Same is true for the list of function printed by the dataflow binary:

$ cat functions.txt | egrep FSE_buildDTable_rle
FSE_buildDTable_rle
FSE_buildDTable_rle
FSE_buildDTable_rle
FSE_buildDTable_rle
FSE_buildDTable_rle

Dor1s commented 4 years ago

But the functions issue is unlikely the root cause, as there are cases with a unique function being used and still zero inputs reported, e.g. https://pantheon.corp.google.com/storage/browser/_details/zstd-logs.clusterfuzz-external.appspot.com/libFuzzer_zstd_stream_decompress/libfuzzer_asan_zstd/2020-01-29/01:09:54:796333.log

Dor1s commented 4 years ago

I've been testing the exact same build on a bot and it works perfectly. That left me with minijail as the only suspect. Everything seems to work inside minijail too, but I cannot fully reproduce the command ClusterFuzz uses, as I'm having issues with using -P (pivot root) argument.

@oliverchang is there anything special about that dir? I literally replicated the command and it works if I remove -P and all -b arguments, while with them I'm always getting silent exit with the return code 139.

google / oss-fuzz

Proposal: DFT-based fuzzing #1632

Data Flow Trace

Collecting the DFT

Using the DFT

Example