google / oss-fuzz-gen

LLM powered fuzzing via OSS-Fuzz.
Apache License 2.0
801 stars 93 forks source link

Early results for vulnerability analysis and remediation for OSS-Fuzz bugs #458

Open inferno-chromium opened 1 month ago

inferno-chromium commented 1 month ago

Sharing some early results from trying out vulnerability analysis and remediation for OSS-Fuzz bugs (see script here, enclosed llm output for 5 vulns). This is using Gemini 1.5 Pro with the 1M context window. The high level idea is OSS-Fuzz vulnerabilities have a precise regression range on where the issue was introduced (and sometimes even exact changeset, but a range of few commits works fine as well) and a detailed sanitizer crash stacktrace with the source files (where the issue most probably exists). If we feed Gemini the crash sanitizer stacktrace, the regression changeset git diff and source code of the files for every sanitizer stack frame (at crash revision), we see:

This is early work, there can be a lot of improvements that can be done:

69527.libical.rawoutput.txt 69527.stacktrace.txt 65777.qpdf.rawoutput.txt 65777.stacktrace.txt 64337.pcre2.rawoutput.txt 64337.stacktrace.txt 69497.ArduinoJson.rawoutput.txt 69497.stacktrace.txt

inferno-chromium commented 1 month ago

Example: Bug: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=65777 Cmd: python -m experimental.manual.oss_fuzz_vuln_prompt \ --repo_url=https://github.com/qpdf/qpdf \ --regression_range=1d96af8b9f72f66d48b77c5232b3e4d38d6683ca:452e1f5c20ec6adf62cd296cb9dddacbc06e6ffa \ --crash_revision=7315AA8C9B924C9351838EC5C1351A8C75E66BF1 \ --crash_stacktrace=65777.stacktrace.txt > prompt.txt ; python -m experimental.manual.prompter -p prompt.txt -l vertex_ai_gemini-1-5

LLM output below:

## 1. Vulnerability Identification

**Type:** Use-after-free

**Impact:** Crash leading to denial-of-service. Potentially, arbitrary code execution depending on the heap exploitation technique used.

**Code Patterns:** 

* The crash occurs in `QPDF::read_xref`, line 715, when accessing a `QPDFXRefEntry` that has already been freed. 
* This is caused by the call to `removeObject` within the loop iterating over the `m->xref_table`. 
* The `removeObject` function, introduced in the changeset, erases entries from the `m->xref_table` while the loop continues to iterate, potentially accessing a dangling pointer.

**Approximate line number matches:**

* The stack trace points to line 715 in `QPDF::read_xref`, while the changeset introduces the `removeObject` call around line 714. This is a close match considering the changeset is applied to a slightly earlier revision.

## 2. Technical Explanation

1. The `QPDF::read_xref` function iterates through the `m->xref_table` which contains object generation information.
2. Within this loop, the `removeObject` function is called if it encounters an object with the same object number but a lower generation than the previous one.
3. `removeObject` erases the entry with the lower generation from `m->xref_table`.
4. However, the loop in `read_xref` continues iterating, potentially accessing the erased entry, which is now a dangling pointer.
5. Accessing this dangling pointer leads to a heap-use-after-free error, causing the program to crash.

**An attacker could exploit this vulnerability by:**

1. Crafting a malicious PDF file containing an xref table with duplicate object numbers but varying generations.
2. This would trigger the use-after-free in `QPDF::read_xref`, causing a denial of service.
3. Depending on the system and the specific exploitation technique, an attacker might be able to control the freed memory and achieve arbitrary code execution.

## 3. Remediation Patch

The patch below resolves the vulnerability by deferring the removal of lower-generation objects:

```diff
--- a/libqpdf/QPDF.cc
+++ b/libqpdf/QPDF.cc
@@ -707,13 +707,16 @@ QPDF::read_xref(qpdf_offset_t xref_offset)
     // never depend on its being set.
     m->deleted_objects.clear();

-    // Make sure we keep only the highest generation for any object.
+    // Make sure we keep only the highest generation for any object. We can't do this while
+    // iterating because removing an object may invalidate iterators later in the sequence.
+    std::set<QPDFObjGen> to_remove;
     QPDFObjGen last_og{-1, 0};
     for (auto const& og: m->xref_table) {
         if (og.first.getObj() == last_og.getObj())
-            removeObject(last_og);
+            to_remove.insert(last_og);
         last_og = og.first;
     }
+    for (auto const& og: to_remove) {
+        removeObject(og);
+    }
 }

 bool

This patch:

1. Creates a temporary `to_remove` set to store object generations slated for removal.
2. After iterating over the `m->xref_table`, it safely removes the lower-generation objects from the table using the `to_remove` set.
3. This approach avoids modifying the `m->xref_table` while iterating over it, preventing the use-after-free vulnerability.

This patch effectively prevents the crash and potential exploitation, while preserving the intended functionality of removing duplicate objects with lower generations. 
Jarvx commented 1 month ago

Hi, would you please share the meta data of the 5 cases or give any references so that I can dig out?

inferno-chromium commented 1 month ago

Hi, would you please share the meta data of the 5 cases or give any references so that I can dig out?

https://bugs.chromium.org/p/oss-fuzz/issues/detail?id= where bug_number is 69527, 65777, 64337, 69497

Jarvx commented 1 month ago

Hi,

when I tried to reproduce the vulnerability of qpdf. I got this issue. Is this due to base image change?

#12 exporting to image
#12 exporting layers done
#12 writing image sha256:2bb0f793adf91fae9530eb4995cc38cfa466d130ef627038d29b35de2a7e8911 done
#12 naming to gcr.io/oss-fuzz/qpdf done
#12 DONE 0.0s

 2 warnings found (use --debug to expand):
 - MaintainerDeprecated: Maintainer instruction is deprecated in favor of using label (line 18)
 - WorkdirRelativePath: Relative workdir "qpdf" can have unexpected results if the base image changes (line 23)
ERROR:root:Failed to detect repo:
python3: can't open file '/opt/cifuzz/detect_repo.py': [Errno 2] No such file or directory

and in another project libical.

 1 warning found (use --debug to expand):
 - WorkdirRelativePath: Relative workdir "libical" can have unexpected results if the base image changes (line 22)
oliverchang commented 3 weeks ago

@Jarvx Sorry for the slow response. Yes, these look like they're using an older base image that doesn't have the required script. Is this https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=65777 ? If so, this is a little surprising because it's quite recent. Can you check if you're building this image with an up to date base image from oss-fuzz?

Jarvx commented 2 weeks ago

@oliverchang Thanks for the information I have managed to reproduce this one after setting it up again.