yob / pdf-reader

The PDF::Reader library implements a PDF parser conforming as much as possible to the PDF specification from Adobe.
MIT License
1.81k stars 271 forks source link

PDF::Reader::ObjectHash.get_page_objects: undefined method `[]' for nil:NilClass #458

Closed bcoles closed 2 years ago

bcoles commented 2 years ago

This error is by far the most frequently encountered error during fuzzing.

crashes/20220417055748989526710_crash_497.pdf.trace-1-undefined method `[]' for nil:NilClass
crashes/20220417055748989526710_crash_497.pdf.trace:2:/var/lib/gems/2.7.0/gems/pdf-reader-2.9.2/lib/pdf/reader/object_hash.rb:576:in `get_page_objects'

20220417055748989526710_crash_497.pdf

I have not investigated the root cause. I have no idea how realistic this issue is outside of fuzzing. However, fixing this issue would significantly reduce the fuzzer output. This would speed up fuzzing and save disk space by not having to write the crash file and output to disk.

The surrounding code raises MalformedPDFError but these code paths are frequently missed.

https://github.com/yob/pdf-reader/blob/21d0d21051b6c5ae160cb281e19e4bb23c3f9e56/lib/pdf/reader/object_hash.rb#L570-L588

yob commented 2 years ago

woah, spooky. This was just converted to a MalformedPDFError by #457

yob commented 2 years ago

If you can confirm #457 fixes the crash for you in fuzzing, I'll close this issue.

bcoles commented 2 years ago

If you can confirm #457 fixes the crash for you in fuzzing, I'll close this issue.

Re-running the fuzzer now. This issue appears to have been resolved.

user@ubuntu:~/Desktop/pdf-reader$ grep get_page_objects crashes/*.trace | wc -l
0

There would usually have been hundreds or thousands of these crashes by now.

user@ubuntu:~/Desktop/pdf-reader$ grep get_page_objects crashes.1/*.trace | wc -l
4140
user@ubuntu:~/Desktop/pdf-reader$ grep get_page_objects crashes.2/*.trace | wc -l
19736

image