google / oss-fuzz-gen

LLM powered fuzzing via OSS-Fuzz.
Apache License 2.0
930 stars 108 forks source link

No run result from certain benchmarks #132

Open DonggeLiu opened 8 months ago

DonggeLiu commented 8 months ago

https://llm-exp.oss-fuzz.com/Result-reports/scheduled/2024-02-29-weekly-all/sample/output-wxwidgets-_z15wxgetfreememoryv/01

cjx10 commented 8 months ago

Incorrect target name and target path. For wxwidgets, runner.cpp was not compiled, only zip.cpp

DonggeLiu commented 8 months ago

FI's daily report contains the info about fuzz target, but 1) the fuzz target path is not consistent, and 2) it does not store the fuz target binary name yet.

Inconsistency: Some projects' reports contain the full path (which is what we need): https://storage.googleapis.com/oss-fuzz-introspector/abseil-cpp/inspector-report/20240305/summary.json But others seem to use binary name: https://storage.googleapis.com/oss-fuzz-introspector/wxwidgets/inspector-report/20240305/summary.json

Ideally, we need two info from the FI report:

  1. Fuzz target path. In particular, some projects move the fuzz target during compilation. We need the fuzz target path before compilation, but FI usually provides the one after compilation.
  2. Binary name, which can be different from fuzz target basename.
DavidKorczynski commented 8 months ago

Ideally, we need two info from the FI report:

  1. Fuzz target path.

Just to clarify, this refers to the source code path?

Inconsistency: Some projects' reports contain the full path (which is what we need): https://storage.googleapis.com/oss-fuzz-introspector/abseil-cpp/inspector-report/20240305/summary.json But others seem to use binary name: https://storage.googleapis.com/oss-fuzz-introspector/wxwidgets/inspector-report/20240305/summary.json

Is the issue with wxwidgets or abseil-cpp, or both? wxwidgets has both the fuzzer binary name and the source code path: https://storage.googleapis.com/oss-fuzz-introspector/wxwidgets/inspector-report/20240318/fuzz_report.html -- abseil-cpp only has the source code path. To my understanding wxwidgets is good but abseil-cpp has issues, is this correctly understood?

DonggeLiu commented 8 months ago

Hi @DavidKorczynski!

  1. Fuzz target path. Just to clarify, this refers to the source code path?

Yes! The framework needs the source code path before compilation to locate the file and then replace its content with the LLM-generated fuzz target. I recall that you explained that some projects move the files around while building the fuzz target. Hence, the path before compilation is required.

Is the issue with wxwidgets or abseil-cpp, or both? wxwidgets has both the fuzzer binary name and the source code path: https://storage.googleapis.com/oss-fuzz-introspector/wxwidgets/inspector-report/20240318/fuzz_report.html -- abseil-cpp only has the source code path. To my understanding wxwidgets is good but abseil-cpp has issues, is this correctly understood?

Yes! You are correct. wxwidgets is good because it provides both fuzz target binary name and fuzz target source code path, which is needed by its benchmark YAML. In this particular case, our benchmark YAML is wrong because runner.cpp was never compiled. FI correctly provides zip and /src/wxwidgets/tests/fuzz/zip.cpp, precisely what we need.

Since wxwidgets's summary.json does not have this info, what's the best way to retrieve this information? (E.g., other files in bucket, APIs, or libs? I presume parsing the fuzz_report.html is not a scalable idea.)

DavidKorczynski commented 7 months ago

Can you confirm abseil-cpp is looking good now? This is what I get when I generate a benchmark:

"functions":
- "name": "_ZN4absl19str_format_internal13FormatArgImpl8DispatchIxEEbNS1_4DataENS0_24FormatConversionSpecImplEPv"
  "params":
  - "name": "arg"
    "type": "char *"
  - "name": "spec"
    "type": "size_t"
  - "name": "out"
    "type": "int"
  - "name": ""
    "type": "char *"
  "return_type": "bool"
  "signature": "bool absl::str_format_internal::FormatArgImpl::Dispatch<absl::str_format_internal::VoidPtr>(DW_TAG_union_typeData, FormatConversionSpecImpl, void *)"
- "name": "_ZN4absl19str_format_internal13FormatArgImpl8DispatchINS_6int128EEEbNS1_4DataENS0_24FormatConversionSpecImplEPv"
  "params":
  - "name": "arg"
    "type": "char *"
  - "name": "spec"
    "type": "size_t"
  - "name": "out"
    "type": "int"
  - "name": ""
    "type": "char *"
  "return_type": "bool"
  "signature": "bool absl::str_format_internal::FormatArgImpl::Dispatch<absl::Dispatch<absl::str_format_internal::VoidPtr>(DW_TAG_union_typeData, FormatConversionSpecImpl, void *)"
- "name": "_ZN4absl19str_format_internal17FormatConvertImplEwNS0_24FormatConversionSpecImplEPNS0_14FormatSinkImplE"
  "params":
  - "name": "v"
    "type": "int"
  - "name": "conv"
    "type": "size_t"
  - "name": "sink"
    "type": "int"
  - "name": ""
    "type": "absl::str_format_internal::FormatSinkImpl *"
  "return_type": "char"
  "signature": "CharConvertResult absl::str_format_internal::FormatConvertImpl(wchar_t, const FormatConversionSpecImpl, FormatSinkImpl *)"
"language": "c++"
"project": "abseil-cpp"
"target_name": "string_escape_fuzzer"
"target_path": "/src/string_escape_fuzzer.cc"
DonggeLiu commented 7 months ago

Thanks @DavidKorczynski! May I ask if this is generated via python -m data_prep.introspector abseil-cpp -m 5 or a new lib/API of FI?

I can confirm that data_prep.introspector generates the same result on my side, but I thought previously the binary target_name and source code target_path were the same, too? E.g., https://github.com/google/oss-fuzz-gen/pull/162/files

Previously, the target_name and target_path were generated via building local docker images, which takes a long time and is error-prone (e.g., using runner.cpp from wxwidgets, which was never built). Given FI has the correct target_name and target_path in some project reports, it would be great if a user could use them directly. What would be the best way to access that info?

DavidKorczynski commented 7 months ago

Thanks @DavidKorczynski! May I ask if this is generated via python -m data_prep.introspector abseil-cpp -m 5 or a new lib/API of FI?

Yes, it was

Previously, the target_name and target_path were generated via building local docker images, which takes a long time and is error-prone (e.g., using runner.cpp from wxwidgets, which was never built). Given FI has the correct target_name and target_path in some project reports, it would be great if a user could use them directly. What would be the best way to access that info?

Ah, I misunderstood @DonggeLiu -- I thought that the goal was to fix project_src.search_source if it was broken but the plan is to substitute it in full to avoid having that step?

DavidKorczynski commented 7 months ago

In https://github.com/google/oss-fuzz/pull/11718 I pushed out a fix that allows FI to see the binary for bazel projects. For example, abseil-cpp used to be:

Screenshot 2024-04-03 111941

And now it is:

Screenshot 2024-04-03 111843

The only missing piece there for FI to provide all the info is giving the source at the start of the Dockerfile creation, and not the bazel sandbox path

DavidKorczynski commented 7 months ago

Notice there are some projects with limitations. For example, picotls creates analysis for three harnesses. However, picotls in fact only has two harnesses in OSS-Fuzz, but, they build three harnesses and only copy two of them out -- i.e. one of the build fuzzers does not actually run on OSS-Fuzz so there is no binary to map to. They build 3 fuzzers but only copy 2 to oss-fuzz's project: https://github.com/google/oss-fuzz/blob/153df7202e2838a3239f2a631686ad49eff62347/projects/picotls/build.sh#L21-L22

Consequently, there will be one erroneous match here.

DonggeLiu commented 7 months ago

Ah, I misunderstood @DonggeLiu -- I thought that the goal was to fix project_src.search_source if it was broken but the plan is to substitute it in full to avoid having that step?

Yep, sorry for not being clearer earlier! That step is redundant and time-consuming, because FI already has the correct binary target_name and fuzz target source code target_path for most projects, and they are almost always constant over time. Replacing that step can avoid building docker images and retrieving them from containers whenever we update benchmarks.

They build 3 fuzzers but only copy 2 to oss-fuzz's project:

That's an interesting case, thanks for noticing it. Since we only need one pair of binary target_name and source code target_path, is there a general way to automatically select the one actually used in fuzzing? (e.g., by checking the past coverage or other cleverer ways)