yob / pdf-reader

The PDF::Reader library implements a PDF parser conforming as much as possible to the PDF specification from Adobe.
MIT License
1.81k stars 271 forks source link

Enable type checking of the CMap class #428

Closed yob closed 2 years ago

yob commented 2 years ago

This was originally skipped in #361 because I was getting an unreachable code error. I found it was possible to avoid that by providing a type annotation for the mode variable of the state machine instead of hard coding it to :none.

Sadly, I can't do the type annotation with T.let() because sorbet isn't a runtime dependency of pdf-reader, but I can hack it in via a sig in the RBI file if I make the initial state part of the method params.

I think this might be a sorbet bug: https://github.com/sorbet/sorbet/issues/4174#issuecomment-1014923965.

This was prompted by @bcoles highlighting some crashes via a fuzzer in https://github.com/yob/pdf-reader/pull/421#issuecomment-1012591423 - I was curious to see if fully typing the CMap class would avoid the fuzzer crashes in that class.

For this particular class at least, it did. I modified the fuzzer in #248 like this and ran it for ~15 mins. No exceptions were raised from the CMap class.

diff --git a/tools/fuzz.rb b/tools/fuzz.rb
index ab81840..e87fbc6 100755
--- a/tools/fuzz.rb
+++ b/tools/fuzz.rb
@@ -182,7 +182,7 @@ def parse(reader)
   contents = ''
   reader.pages.each do |page|
     contents << page.fonts.to_s
-    contents << page.text.force_encoding('utf-8')
+    contents << page.text #.force_encoding('utf-8')
     contents << page.raw_content.force_encoding('utf-8')
   end
   # puts contents if VERBOSE
@@ -203,12 +203,14 @@ def summary
 # and save fuzz test case and backtrace to OUTPUT_DIR
 #
 def report_crash(e)
-  puts " - #{e.message}"
-  puts e.backtrace.first
-  fname = "#{DateTime.now.strftime('%Y%m%d%H%M%S%N')}_crash_#{rand(1000)}"
-  FileUtils.mv @fuzz_outfile, "#{OUTPUT_DIR}/#{fname}.pdf"
-  File.open("#{OUTPUT_DIR}/#{fname}.pdf.trace", 'w') do |file|
-    file.write "#{e.message}\n#{e.backtrace.join "\n"}"
+  if e.backtrace.any? { |l| l.include?("cmap") }
+    puts " - #{e.message}"
+    puts e.backtrace.first
+    fname = "#{DateTime.now.strftime('%Y%m%d%H%M%S%N')}_crash_#{rand(1000)}"
+    FileUtils.mv @fuzz_outfile, "#{OUTPUT_DIR}/#{fname}.pdf"
+    File.open("#{OUTPUT_DIR}/#{fname}.pdf.trace", 'w') do |file|
+      file.write "#{e.message}\n#{e.backtrace.join "\n"}"
+    end
   end
 end