Open DonggeLiu opened 8 months ago
Incorrect target name and target path.
For wxwidgets
, runner.cpp
was not compiled, only zip.cpp
FI's daily report contains the info about fuzz target, but 1) the fuzz target path is not consistent, and 2) it does not store the fuz target binary name yet.
Inconsistency: Some projects' reports contain the full path (which is what we need): https://storage.googleapis.com/oss-fuzz-introspector/abseil-cpp/inspector-report/20240305/summary.json But others seem to use binary name: https://storage.googleapis.com/oss-fuzz-introspector/wxwidgets/inspector-report/20240305/summary.json
Ideally, we need two info from the FI report:
Ideally, we need two info from the FI report:
- Fuzz target path.
Just to clarify, this refers to the source code path?
Inconsistency: Some projects' reports contain the full path (which is what we need): https://storage.googleapis.com/oss-fuzz-introspector/abseil-cpp/inspector-report/20240305/summary.json But others seem to use binary name: https://storage.googleapis.com/oss-fuzz-introspector/wxwidgets/inspector-report/20240305/summary.json
Is the issue with wxwidgets
or abseil-cpp
, or both? wxwidgets has both the fuzzer binary name and the source code path: https://storage.googleapis.com/oss-fuzz-introspector/wxwidgets/inspector-report/20240318/fuzz_report.html -- abseil-cpp only has the source code path. To my understanding wxwidgets
is good but abseil-cpp
has issues, is this correctly understood?
Hi @DavidKorczynski!
- Fuzz target path. Just to clarify, this refers to the source code path?
Yes! The framework needs the source code path before compilation to locate the file and then replace its content with the LLM-generated fuzz target. I recall that you explained that some projects move the files around while building the fuzz target. Hence, the path before compilation is required.
Is the issue with
wxwidgets
orabseil-cpp
, or both? wxwidgets has both the fuzzer binary name and the source code path: https://storage.googleapis.com/oss-fuzz-introspector/wxwidgets/inspector-report/20240318/fuzz_report.html -- abseil-cpp only has the source code path. To my understandingwxwidgets
is good butabseil-cpp
has issues, is this correctly understood?
Yes! You are correct.
wxwidgets
is good because it provides both fuzz target binary name and fuzz target source code path, which is needed by its benchmark YAML. In this particular case, our benchmark YAML is wrong because runner.cpp
was never compiled. FI correctly provides zip
and /src/wxwidgets/tests/fuzz/zip.cpp
, precisely what we need.
Since wxwidgets
's summary.json
does not have this info, what's the best way to retrieve this information? (E.g., other files in bucket, APIs, or libs? I presume parsing the fuzz_report.html
is not a scalable idea.)
Can you confirm abseil-cpp
is looking good now? This is what I get when I generate a benchmark:
"functions":
- "name": "_ZN4absl19str_format_internal13FormatArgImpl8DispatchIxEEbNS1_4DataENS0_24FormatConversionSpecImplEPv"
"params":
- "name": "arg"
"type": "char *"
- "name": "spec"
"type": "size_t"
- "name": "out"
"type": "int"
- "name": ""
"type": "char *"
"return_type": "bool"
"signature": "bool absl::str_format_internal::FormatArgImpl::Dispatch<absl::str_format_internal::VoidPtr>(DW_TAG_union_typeData, FormatConversionSpecImpl, void *)"
- "name": "_ZN4absl19str_format_internal13FormatArgImpl8DispatchINS_6int128EEEbNS1_4DataENS0_24FormatConversionSpecImplEPv"
"params":
- "name": "arg"
"type": "char *"
- "name": "spec"
"type": "size_t"
- "name": "out"
"type": "int"
- "name": ""
"type": "char *"
"return_type": "bool"
"signature": "bool absl::str_format_internal::FormatArgImpl::Dispatch<absl::Dispatch<absl::str_format_internal::VoidPtr>(DW_TAG_union_typeData, FormatConversionSpecImpl, void *)"
- "name": "_ZN4absl19str_format_internal17FormatConvertImplEwNS0_24FormatConversionSpecImplEPNS0_14FormatSinkImplE"
"params":
- "name": "v"
"type": "int"
- "name": "conv"
"type": "size_t"
- "name": "sink"
"type": "int"
- "name": ""
"type": "absl::str_format_internal::FormatSinkImpl *"
"return_type": "char"
"signature": "CharConvertResult absl::str_format_internal::FormatConvertImpl(wchar_t, const FormatConversionSpecImpl, FormatSinkImpl *)"
"language": "c++"
"project": "abseil-cpp"
"target_name": "string_escape_fuzzer"
"target_path": "/src/string_escape_fuzzer.cc"
Thanks @DavidKorczynski! May I ask if this is generated via python -m data_prep.introspector abseil-cpp -m 5
or a new lib/API of FI?
I can confirm that data_prep.introspector
generates the same result on my side, but I thought previously the binary target_name
and source code target_path
were the same, too? E.g., https://github.com/google/oss-fuzz-gen/pull/162/files
Previously, the target_name
and target_path
were generated via building local docker images, which takes a long time and is error-prone (e.g., using runner.cpp
from wxwidgets
, which was never built).
Given FI has the correct target_name
and target_path
in some project reports, it would be great if a user could use them directly. What would be the best way to access that info?
Thanks @DavidKorczynski! May I ask if this is generated via python -m data_prep.introspector abseil-cpp -m 5 or a new lib/API of FI?
Yes, it was
Previously, the
target_name
andtarget_path
were generated via building local docker images, which takes a long time and is error-prone (e.g., usingrunner.cpp
fromwxwidgets
, which was never built). Given FI has the correcttarget_name
andtarget_path
in some project reports, it would be great if a user could use them directly. What would be the best way to access that info?
Ah, I misunderstood @DonggeLiu -- I thought that the goal was to fix project_src.search_source
if it was broken but the plan is to substitute it in full to avoid having that step?
In https://github.com/google/oss-fuzz/pull/11718 I pushed out a fix that allows FI to see the binary for bazel projects. For example, abseil-cpp used to be:
And now it is:
The only missing piece there for FI to provide all the info is giving the source at the start of the Dockerfile creation, and not the bazel sandbox path
Notice there are some projects with limitations. For example, picotls
creates analysis for three harnesses. However, picotls in fact only has two harnesses in OSS-Fuzz, but, they build three harnesses and only copy two of them out -- i.e. one of the build fuzzers does not actually run on OSS-Fuzz so there is no binary to map to. They build 3 fuzzers but only copy 2 to oss-fuzz's project: https://github.com/google/oss-fuzz/blob/153df7202e2838a3239f2a631686ad49eff62347/projects/picotls/build.sh#L21-L22
Consequently, there will be one erroneous match here.
Ah, I misunderstood @DonggeLiu -- I thought that the goal was to fix
project_src.search_source
if it was broken but the plan is to substitute it in full to avoid having that step?
Yep, sorry for not being clearer earlier!
That step is redundant and time-consuming, because FI already has the correct binary target_name
and fuzz target source code target_path
for most projects, and they are almost always constant over time. Replacing that step can avoid building docker images and retrieving them from containers whenever we update benchmarks.
They build 3 fuzzers but only copy 2 to oss-fuzz's project:
That's an interesting case, thanks for noticing it.
Since we only need one pair of binary target_name
and source code target_path
, is there a general way to automatically select the one actually used in fuzzing? (e.g., by checking the past coverage or other cleverer ways)
https://llm-exp.oss-fuzz.com/Result-reports/scheduled/2024-02-29-weekly-all/sample/output-wxwidgets-_z15wxgetfreememoryv/01