yob / pdf-reader

The PDF::Reader library implements a PDF parser conforming as much as possible to the PDF specification from Adobe.
MIT License
1.81k stars 271 forks source link

Add tests with a variety of PDFs that cause unexpected exceptions #421

Closed yob closed 2 years ago

yob commented 2 years ago

I've taken the sample PDFs from a bunch of GitHub issues created by @bcoles and created specs to ensure they raise the expected errors.

I then hacked and slashed my way to getting the tests green in the first commit.

The second commit adds type checking derefs to ObjectHash and uses them where we can

The bulk of the issues are incorrect types coming out of the PDF, and pdf-reader not noticing. For example, we expect an Array but get back a Number and then start passing it around until we eventually call .each on it and get a NoMethodError.

If the PDF is corrupt and has an unusable object type, I think we want to raise a MalformedPDFError as close to the source of the problem as possible. This commit proposes doing that by add type-specific deref methods to ObjectHash. When using these new methods we are asserting the type we expect back, and for valid PDFs there's no change in behaviour. However, PDFs that have the wrong type for a particular object (and the type can't be cast/coerced into the required type) will raise a useful error.

There's still a lot more we could do for type safety. In particular, reading values out of Hashes/Dicts is still not type safe in many places. Still, this is a good start that will pick up many basic corruption issues.

Fixes #222 Fixes #223 Fixes #224 Fixes #227 Fixes #228 Fixes #229 Fixes #230 Fixes #231 Fixes #232 Fixes #234 Fixes #235 Fixes #236 Fixes #237 Fixes #238 Fixes #239 Fixes #240 Fixes #241 Fixes #242 Fixes #243 Fixes #244 Fixes #245

bcoles commented 2 years ago

I re-ran the fuzzer with spec/data/**.pdf on latest master after this merge. A bunch of new crashes.

Unique:

$ head -n 1 crashes/*.trace | fgrep -v "==>" | sort -u

:"0" can't be coerced into Float
:"47.384" can't be coerced into Float
Array can't be coerced into Integer
bad decrypt
bad value for range
comparison of PDF::Reader::Token with 3 failed
diff (0) must be a Array
execution expired
Invalid argument
invalid bfrange section
Invalid filter algorithm 10
Invalid filter algorithm 100
Invalid filter algorithm 101
Invalid filter algorithm 102
Invalid filter algorithm 104
Invalid filter algorithm 105
Invalid filter algorithm 106
Invalid filter algorithm 107
Invalid filter algorithm 108
Invalid filter algorithm 109
Invalid filter algorithm 11
Invalid filter algorithm 112
Invalid filter algorithm 113
Invalid filter algorithm 116
Invalid filter algorithm 118
Invalid filter algorithm 119
Invalid filter algorithm 12
Invalid filter algorithm 122
Invalid filter algorithm 125
Invalid filter algorithm 128
Invalid filter algorithm 129
Invalid filter algorithm 13
Invalid filter algorithm 130
Invalid filter algorithm 137
Invalid filter algorithm 14
Invalid filter algorithm 148
Invalid filter algorithm 15
Invalid filter algorithm 150
Invalid filter algorithm 151
Invalid filter algorithm 152
Invalid filter algorithm 154
Invalid filter algorithm 155
Invalid filter algorithm 157
Invalid filter algorithm 158
Invalid filter algorithm 16
Invalid filter algorithm 162
Invalid filter algorithm 164
Invalid filter algorithm 167
Invalid filter algorithm 168
Invalid filter algorithm 17
Invalid filter algorithm 173
Invalid filter algorithm 175
Invalid filter algorithm 178
Invalid filter algorithm 18
Invalid filter algorithm 180
Invalid filter algorithm 181
Invalid filter algorithm 185
Invalid filter algorithm 188
Invalid filter algorithm 189
Invalid filter algorithm 19
Invalid filter algorithm 190
Invalid filter algorithm 194
Invalid filter algorithm 196
Invalid filter algorithm 197
Invalid filter algorithm 20
Invalid filter algorithm 202
Invalid filter algorithm 207
Invalid filter algorithm 208
Invalid filter algorithm 211
Invalid filter algorithm 219
Invalid filter algorithm 22
Invalid filter algorithm 222
Invalid filter algorithm 226
Invalid filter algorithm 228
Invalid filter algorithm 23
Invalid filter algorithm 230
Invalid filter algorithm 231
Invalid filter algorithm 232
Invalid filter algorithm 233
Invalid filter algorithm 234
Invalid filter algorithm 237
Invalid filter algorithm 238
Invalid filter algorithm 239
Invalid filter algorithm 24
Invalid filter algorithm 240
Invalid filter algorithm 242
Invalid filter algorithm 243
Invalid filter algorithm 245
Invalid filter algorithm 246
Invalid filter algorithm 249
Invalid filter algorithm 250
Invalid filter algorithm 251
Invalid filter algorithm 252
Invalid filter algorithm 253
Invalid filter algorithm 254
Invalid filter algorithm 255
Invalid filter algorithm 26
Invalid filter algorithm 27
Invalid filter algorithm 28
Invalid filter algorithm 29
Invalid filter algorithm 31
Invalid filter algorithm 34
Invalid filter algorithm 36
Invalid filter algorithm 37
Invalid filter algorithm 40
Invalid filter algorithm 41
Invalid filter algorithm 42
Invalid filter algorithm 44
Invalid filter algorithm 46
Invalid filter algorithm 47
Invalid filter algorithm 48
Invalid filter algorithm 49
Invalid filter algorithm 5
Invalid filter algorithm 50
Invalid filter algorithm 52
Invalid filter algorithm 53
Invalid filter algorithm 54
Invalid filter algorithm 55
Invalid filter algorithm 56
Invalid filter algorithm 57
Invalid filter algorithm 58
Invalid filter algorithm 59
Invalid filter algorithm 6
Invalid filter algorithm 62
Invalid filter algorithm 63
Invalid filter algorithm 66
Invalid filter algorithm 68
Invalid filter algorithm 69
Invalid filter algorithm 7
Invalid filter algorithm 70
Invalid filter algorithm 71
Invalid filter algorithm 72
Invalid filter algorithm 73
Invalid filter algorithm 74
Invalid filter algorithm 75
Invalid filter algorithm 76
Invalid filter algorithm 79
Invalid filter algorithm 8
Invalid filter algorithm 80
Invalid filter algorithm 81
Invalid filter algorithm 83
Invalid filter algorithm 87
Invalid filter algorithm 88
Invalid filter algorithm 89
Invalid filter algorithm 90
Invalid filter algorithm 91
Invalid filter algorithm 92
Invalid filter algorithm 93
Invalid filter algorithm 96
Invalid filter algorithm 97
Invalid filter algorithm 98
iv must be 16 bytes
NaN
need dictionary
nil can't be coerced into Float
nil can't be coerced into Integer
no implicit conversion from nil to integer
no implicit conversion of Float into String
no implicit conversion of Hash into Integer
no implicit conversion of Integer into String
no implicit conversion of nil into String
no implicit conversion of PDF::Reader::Token into Integer
no implicit conversion of String into Integer
no implicit conversion of Symbol into Integer
:P can't be coerced into Integer
PDF::Reader::Token can't be coerced into Integer
stack level too deep
String can't be coerced into Integer
undefined method `a' for nil:NilClass
undefined method `bytesize' for :"0009":Symbol
undefined method `bytesize' for :"00e":Symbol
undefined method `bytesize' for 0:Integer
undefined method `bytesize' for 131:Integer
undefined method `bytesize' for 192:Integer
undefined method `bytesize' for 1:Integer
undefined method `bytesize' for 20029:Integer
undefined method `bytesize' for 2020:Integer
undefined method `bytesize' for 2122:Integer
undefined method `bytesize' for 21:Integer
undefined method `bytesize' for 2260:Integer
undefined method `bytesize' for 2:Integer
undefined method `bytesize' for 38:Integer
undefined method `bytesize' for 3:Integer
undefined method `bytesize' for 4:Integer
undefined method `bytesize' for 51:Integer
undefined method `bytesize' for 55:Integer
undefined method `bytesize' for 598:Integer
undefined method `bytesize' for 5:Integer
undefined method `bytesize' for 60020:Integer
undefined method `bytesize' for 60062:Integer
undefined method `bytesize' for 66:Integer
undefined method `bytesize' for 68:Integer
undefined method `bytesize' for 69:Integer
undefined method `bytesize' for 6:Integer
undefined method `bytesize' for 700:Integer
undefined method `bytesize' for 72:Integer
undefined method `bytesize' for 78:Integer
undefined method `bytesize' for 7:Integer
undefined method `bytesize' for 87:Integer
undefined method `bytesize' for 89:Integer
undefined method `bytesize' for 8:Integer
undefined method `bytesize' for 97:Integer
undefined method `bytesize' for 98:Integer
undefined method `bytesize' for 9:Integer
undefined method `each' for " ":String
undefined method `fetch' for 0:Integer
undefined method `fetch' for 14:Integer
undefined method `fetch' for 6:Integer
undefined method `fetch' for 9:Integer
undefined method `fetch' for "":String
undefined method `fetch' for #<String:0x0000558d62ec7cd8>
undefined method `fetch' for #<String:0x0000558d6301f068>
undefined method `fetch' for "\x00":String
undefined method `fetch' for "\xA0":String
undefined method `fetch' for "\xD0":String
undefined method `fetch' for "\xF0":String
undefined method `first' for nil:NilClass
undefined method `/' for [" ", 0.0, "2"]:Array
undefined method `/' for :"0":Symbol
undefined method `/' for :"1426f":Symbol
undefined method `*' for :"2":Symbol
undefined method `/' for :"4677265":Symbol
undefined method `/' for :"52d504446206d6174":Symbol
undefined method `/' for :"54686973207465":Symbol
undefined method `/' for :"5":Symbol
undefined method `/' for :"'646961426f":Symbol
undefined method `/' for :"69686520636f6e74656e742073747265616d2066":Symbol
undefined method `/' for :"6974682064696666":Symbol
undefined method `/' for :"73":Symbol
undefined method `/' for :"841":Symbol
undefined method `/' for :"-8":Symbol
undefined method `/' for #<Array:0x0000558d62ebdd50>
undefined method `/' for #<Array:0x0000558d63014c80>
undefined method `/' for :"\a":Symbol
undefined method `/' for :F53:Symbol
undefined method `*' for :F8:Symbol
undefined method `/' for {:MCID=>109}:Hash
undefined method `+' for nil:NilClass
undefined method `-' for nil:NilClass
undefined method `<=' for nil:NilClass
undefined method `[]' for nil:NilClass
undefined method `[]=' for nil:NilClass
undefined method `/' for :R9:Symbol
undefined method `/' for :Span:Symbol
undefined method `*' for :"":Symbol
undefined method `/' for :"":Symbol
undefined method `/' for :z65206b:Symbol
undefined method `gsub' for #<PDF::Reader::Reference:0x0000558d62200b80 @id=5, @gen=0>
undefined method `gsub' for #<PDF::Reader::Reference:0x0000558d622b0670 @id=4, @gen=0>
undefined method `gsub' for #<PDF::Reader::Reference:0x0000558d623de650 @id=52, @gen=0>
undefined method `gsub' for #<PDF::Reader::Reference:0x0000558d625adf30 @id=12, @gen=0>
undefined method `gsub' for #<PDF::Reader::Reference:0x0000558d62689580 @id=5, @gen=0>
undefined method `gsub' for #<PDF::Reader::Reference:0x0000558d62b60ea0 @id=5, @gen=0>
undefined method `gsub' for #<PDF::Reader::Reference:0x0000558d62ccc910 @id=4, @gen=0>
undefined method `gsub' for #<PDF::Reader::Reference:0x0000558d62e06498 @id=86, @gen=0>
undefined method `gsub' for #<PDF::Reader::Reference:0x0000563ae696f358 @id=5, @gen=0>
undefined method `gsub' for #<PDF::Reader::Reference:0x0000563ae6a1bce8 @id=2, @gen=0>
undefined method `length' for nil:NilClass
undefined method `pack' for nil:NilClass
undefined method `size' for 0.0061:Float
undefined method `times' for nil:NilClass
undefined method `unfiltered_data' for 0:Integer
undefined method `unfiltered_data' for {:FL=>1, :SM=>0.01, :Type=>:ExtGState}:Hash
undefined method `unfiltered_data' for #<Hash:0x0000558d61dbd848>
undefined method `unfiltered_data' for #<Hash:0x0000558d61df06d0>
undefined method `unfiltered_data' for #<Hash:0x0000558d61eef388>
undefined method `unfiltered_data' for #<Hash:0x0000558d61ef7d30>
undefined method `unfiltered_data' for #<Hash:0x0000558d61f17748>
undefined method `unfiltered_data' for #<Hash:0x0000558d61f1b910>
undefined method `unfiltered_data' for #<Hash:0x0000558d620b6bf8>
undefined method `unfiltered_data' for #<Hash:0x0000558d62205ec8>
undefined method `unfiltered_data' for #<Hash:0x0000558d62229800>
undefined method `unfiltered_data' for #<Hash:0x0000558d622b8ca8>
undefined method `unfiltered_data' for #<Hash:0x0000558d622d2900>
undefined method `unfiltered_data' for #<Hash:0x0000558d622e1d38>
undefined method `unfiltered_data' for #<Hash:0x0000558d6234a9c8>
undefined method `unfiltered_data' for #<Hash:0x0000558d62350300>
undefined method `unfiltered_data' for #<Hash:0x0000558d623875d0>
undefined method `unfiltered_data' for #<Hash:0x0000558d62389ab0>
undefined method `unfiltered_data' for #<Hash:0x0000558d623a08a0>
undefined method `unfiltered_data' for #<Hash:0x0000558d623a0990>
undefined method `unfiltered_data' for #<Hash:0x0000558d623ca8d0>
undefined method `unfiltered_data' for #<Hash:0x0000558d624e74c0>
undefined method `unfiltered_data' for #<Hash:0x0000558d624eda50>
undefined method `unfiltered_data' for #<Hash:0x0000558d624f3978>
undefined method `unfiltered_data' for #<Hash:0x0000558d6253b368>
undefined method `unfiltered_data' for #<Hash:0x0000558d62560780>
undefined method `unfiltered_data' for #<Hash:0x0000558d625a7c70>
undefined method `unfiltered_data' for #<Hash:0x0000558d6262e478>
undefined method `unfiltered_data' for #<Hash:0x0000558d626de4b8>
undefined method `unfiltered_data' for #<Hash:0x0000558d628ec728>
undefined method `unfiltered_data' for #<Hash:0x0000558d62a79960>
undefined method `unfiltered_data' for #<Hash:0x0000558d62ad3460>
undefined method `unfiltered_data' for #<Hash:0x0000558d62b54a10>
undefined method `unfiltered_data' for #<Hash:0x0000558d62b65428>
undefined method `unfiltered_data' for #<Hash:0x0000558d62b7cce0>
undefined method `unfiltered_data' for #<Hash:0x0000558d62c6f710>
undefined method `unfiltered_data' for #<Hash:0x0000558d62e06ce0>
undefined method `unfiltered_data' for #<Hash:0x0000558d62e496a8>
undefined method `unfiltered_data' for #<Hash:0x0000558d62efe058>
undefined method `unfiltered_data' for #<Hash:0x0000558d62f599f8>
undefined method `unfiltered_data' for #<Hash:0x0000558d62fda9e0>
undefined method `unfiltered_data' for #<Hash:0x0000558d62fffda8>
undefined method `unfiltered_data' for #<Hash:0x0000558d6343c678>
undefined method `unfiltered_data' for #<Hash:0x0000558d64054b68>
undefined method `unfiltered_data' for #<Hash:0x0000558d64071bf0>
undefined method `unfiltered_data' for #<Hash:0x0000563ae6390bf8>
undefined method `unfiltered_data' for #<Hash:0x0000563ae66d27a8>
undefined method `unfiltered_data' for #<Hash:0x0000563ae67540c8>
undefined method `unfiltered_data' for #<Hash:0x0000563ae691cc20>
undefined method `unfiltered_data' for #<Hash:0x0000563ae6951f88>
undefined method `unfiltered_data' for #<Hash:0x0000563ae69b2dd8>
undefined method `unfiltered_data' for #<Hash:0x0000563ae69e7bc8>
undefined method `unfiltered_data' for #<Hash:0x0000563ae69e8398>
undefined method `unfiltered_data' for nil:NilClass
undefined method `unfiltered_data' for "":String
undefined method `unfiltered_data' for :"":Symbol
undefined method `unfiltered_data' for "\x00":String
undefined method `unfiltered_data' for "\xA0":String
undefined method `unpack' for nil:NilClass
wrong number of arguments (given 0, expected 1)
wrong number of arguments (given 0, expected 2)
wrong number of arguments (given 0, expected 6)
wrong number of arguments (given 102, expected 1)
wrong number of arguments (given 10, expected 0)
wrong number of arguments (given 10, expected 1)
wrong number of arguments (given 10, expected 2)
wrong number of arguments (given 10, expected 6)
wrong number of arguments (given 11, expected 0)
wrong number of arguments (given 11, expected 1)
wrong number of arguments (given 11, expected 6)
wrong number of arguments (given 12, expected 0)
wrong number of arguments (given 12, expected 1)
wrong number of arguments (given 12, expected 6)
wrong number of arguments (given 13, expected 0)
wrong number of arguments (given 13, expected 1)
wrong number of arguments (given 13, expected 6)
wrong number of arguments (given 14, expected 1)
wrong number of arguments (given 14, expected 2)
wrong number of arguments (given 14, expected 6)
wrong number of arguments (given 15, expected 1)
wrong number of arguments (given 15, expected 2)
wrong number of arguments (given 16, expected 1)
wrong number of arguments (given 17, expected 0)
wrong number of arguments (given 17, expected 2)
wrong number of arguments (given 18, expected 1)
wrong number of arguments (given 19, expected 1)
wrong number of arguments (given 19, expected 2)
wrong number of arguments (given 19, expected 6)
wrong number of arguments (given 1, expected 0)
wrong number of arguments (given 1, expected 2)
wrong number of arguments (given 1, expected 6)
wrong number of arguments (given 20, expected 0)
wrong number of arguments (given 20, expected 1)
wrong number of arguments (given 20, expected 2)
wrong number of arguments (given 21, expected 1)
wrong number of arguments (given 21, expected 6)
wrong number of arguments (given 229, expected 6)
wrong number of arguments (given 22, expected 1)
wrong number of arguments (given 23, expected 1)
wrong number of arguments (given 24, expected 2)
wrong number of arguments (given 27, expected 1)
wrong number of arguments (given 28, expected 1)
wrong number of arguments (given 29, expected 0)
wrong number of arguments (given 29, expected 1)
wrong number of arguments (given 2, expected 0)
wrong number of arguments (given 2, expected 1)
wrong number of arguments (given 2, expected 6)
wrong number of arguments (given 30, expected 2)
wrong number of arguments (given 30, expected 6)
wrong number of arguments (given 33, expected 0)
wrong number of arguments (given 33, expected 1)
wrong number of arguments (given 34, expected 0)
wrong number of arguments (given 34, expected 1)
wrong number of arguments (given 35, expected 1)
wrong number of arguments (given 369, expected 6)
wrong number of arguments (given 39, expected 1)
wrong number of arguments (given 3, expected 0)
wrong number of arguments (given 3, expected 1)
wrong number of arguments (given 3, expected 2)
wrong number of arguments (given 3, expected 6)
wrong number of arguments (given 429, expected 1)
wrong number of arguments (given 438, expected 6)
wrong number of arguments (given 49, expected 6)
wrong number of arguments (given 4, expected 0)
wrong number of arguments (given 4, expected 1)
wrong number of arguments (given 4, expected 2)
wrong number of arguments (given 4, expected 6)
wrong number of arguments (given 50, expected 6)
wrong number of arguments (given 5, expected 0)
wrong number of arguments (given 5, expected 1)
wrong number of arguments (given 5, expected 2)
wrong number of arguments (given 5, expected 6)
wrong number of arguments (given 62, expected 6)
wrong number of arguments (given 63, expected 1)
wrong number of arguments (given 65, expected 1)
wrong number of arguments (given 6, expected 0)
wrong number of arguments (given 6, expected 1)
wrong number of arguments (given 6, expected 2)
wrong number of arguments (given 712, expected 6)
wrong number of arguments (given 725, expected 1)
wrong number of arguments (given 7, expected 0)
wrong number of arguments (given 7, expected 1)
wrong number of arguments (given 7, expected 2)
wrong number of arguments (given 7, expected 3)
wrong number of arguments (given 7, expected 6)
wrong number of arguments (given 8, expected 0)
wrong number of arguments (given 8, expected 1)
wrong number of arguments (given 8, expected 2)
wrong number of arguments (given 8, expected 6)
wrong number of arguments (given 9, expected 0)
wrong number of arguments (given 9, expected 1)
wrong number of arguments (given 9, expected 2)
wrong number of arguments (given 9, expected 6)
yob commented 2 years ago

rofl 😂

This PR was part of my experimentation with sorbet (started in #361) to see if it helps improve the code quality. Clearly this PR isn't enough!

I don't necessarily want to just push on and blindly fix the errors you've flagged here, but I would like to at least understand if there's a pattern to them. Is there a few common places where type checking what we've read from the PDF file would catch a lot of the issues?

Is it possible for me to get backtraces for some of the above errros? Or how hard would it be for me to run the fuzzer myself?

wrong number of arguments (given 24, expected 2)

I assume these ones are incorrect arg counts being passed to PageTextReceiver. That doesn't feel like a type checking issue - I think I'd probably need all the methods on that to permissively accept *args, and then only use the ones that particular operator expects.

bcoles commented 2 years ago

Is there a few common places where type checking what we've read from the PDF file would catch a lot of the issues?

There's 84 unique crashes:

$ head -n 2 crashes/*.trace | grep ":in " | sort -u
/var/lib/gems/2.7.0/gems/hashery-2.1.2/lib/hashery/lru_hash.rb:138:in `has_key?'
/var/lib/gems/2.7.0/gems/pdf-reader-2.8.0/lib/pdf/reader/aes_v2_security_handler.rb:36:in `iv='
/var/lib/gems/2.7.0/gems/pdf-reader-2.8.0/lib/pdf/reader/aes_v2_security_handler.rb:37:in `final'
/var/lib/gems/2.7.0/gems/pdf-reader-2.8.0/lib/pdf/reader/buffer.rb:203:in `==='
/var/lib/gems/2.7.0/gems/pdf-reader-2.8.0/lib/pdf/reader/buffer.rb:204:in `==='
/var/lib/gems/2.7.0/gems/pdf-reader-2.8.0/lib/pdf/reader/buffer.rb:204:in `state'
/var/lib/gems/2.7.0/gems/pdf-reader-2.8.0/lib/pdf/reader/buffer.rb:206:in `==='
/var/lib/gems/2.7.0/gems/pdf-reader-2.8.0/lib/pdf/reader/buffer.rb:215:in `state'
/var/lib/gems/2.7.0/gems/pdf-reader-2.8.0/lib/pdf/reader/buffer.rb:332:in `prepare_literal_token'
/var/lib/gems/2.7.0/gems/pdf-reader-2.8.0/lib/pdf/reader/buffer.rb:354:in `initialize_copy'
/var/lib/gems/2.7.0/gems/pdf-reader-2.8.0/lib/pdf/reader/buffer.rb:356:in `getbyte'
/var/lib/gems/2.7.0/gems/pdf-reader-2.8.0/lib/pdf/reader/buffer.rb:409:in `==='
/var/lib/gems/2.7.0/gems/pdf-reader-2.8.0/lib/pdf/reader/buffer.rb:81:in `seek'
/var/lib/gems/2.7.0/gems/pdf-reader-2.8.0/lib/pdf/reader/cid_widths.rb:57:in `parse_second_form'
/var/lib/gems/2.7.0/gems/pdf-reader-2.8.0/lib/pdf/reader/cmap.rb:110:in `str_to_int'
/var/lib/gems/2.7.0/gems/pdf-reader-2.8.0/lib/pdf/reader/cmap.rb:111:in `str_to_int'
/var/lib/gems/2.7.0/gems/pdf-reader-2.8.0/lib/pdf/reader/cmap.rb:135:in `block in process_bfchar_instructions'
/var/lib/gems/2.7.0/gems/pdf-reader-2.8.0/lib/pdf/reader/cmap.rb:146:in `block in process_bfrange_instructions'
/var/lib/gems/2.7.0/gems/pdf-reader-2.8.0/lib/pdf/reader/cmap.rb:158:in `block in bfrange_type_one'
/var/lib/gems/2.7.0/gems/pdf-reader-2.8.0/lib/pdf/reader/encoding.rb:212:in `block (2 levels) in load_mapping'
/var/lib/gems/2.7.0/gems/pdf-reader-2.8.0/lib/pdf/reader/error.rb:51:in `validate_type'
/var/lib/gems/2.7.0/gems/pdf-reader-2.8.0/lib/pdf/reader/filter/depredict.rb:128:in `png_depredict'
/var/lib/gems/2.7.0/gems/pdf-reader-2.8.0/lib/pdf/reader/filter/depredict.rb:69:in `*'
/var/lib/gems/2.7.0/gems/pdf-reader-2.8.0/lib/pdf/reader/filter/flate.rb:36:in `inflate'
/var/lib/gems/2.7.0/gems/pdf-reader-2.8.0/lib/pdf/reader/font.rb:110:in `+'
/var/lib/gems/2.7.0/gems/pdf-reader-2.8.0/lib/pdf/reader/font.rb:167:in `extract_base_info'
/var/lib/gems/2.7.0/gems/pdf-reader-2.8.0/lib/pdf/reader/key_builder_v5.rb:62:in `+'
/var/lib/gems/2.7.0/gems/pdf-reader-2.8.0/lib/pdf/reader/key_builder_v5.rb:73:in `+'
/var/lib/gems/2.7.0/gems/pdf-reader-2.8.0/lib/pdf/reader/lzw.rb:123:in `create_new_string'
/var/lib/gems/2.7.0/gems/pdf-reader-2.8.0/lib/pdf/reader/object_cache.rb:73:in `include?'
/var/lib/gems/2.7.0/gems/pdf-reader-2.8.0/lib/pdf/reader/object_stream.rb:13:in `initialize'
/var/lib/gems/2.7.0/gems/pdf-reader-2.8.0/lib/pdf/reader/object_stream.rb:36:in `offsets'
/var/lib/gems/2.7.0/gems/pdf-reader-2.8.0/lib/pdf/reader/object_stream.rb:37:in `block in offsets'
/var/lib/gems/2.7.0/gems/pdf-reader-2.8.0/lib/pdf/reader/page_layout.rb:40:in `round'
/var/lib/gems/2.7.0/gems/pdf-reader-2.8.0/lib/pdf/reader/page_state.rb:106:in `set_text_font_and_size'
/var/lib/gems/2.7.0/gems/pdf-reader-2.8.0/lib/pdf/reader/page_state.rb:107:in `set_text_font_and_size'
/var/lib/gems/2.7.0/gems/pdf-reader-2.8.0/lib/pdf/reader/page_state.rb:119:in `set_text_leading'
/var/lib/gems/2.7.0/gems/pdf-reader-2.8.0/lib/pdf/reader/page_state.rb:123:in `set_text_rendering_mode'
/var/lib/gems/2.7.0/gems/pdf-reader-2.8.0/lib/pdf/reader/page_state.rb:131:in `set_word_spacing'
/var/lib/gems/2.7.0/gems/pdf-reader-2.8.0/lib/pdf/reader/page_state.rb:139:in `move_text_position'
/var/lib/gems/2.7.0/gems/pdf-reader-2.8.0/lib/pdf/reader/page_state.rb:144:in `move_text_position'
/var/lib/gems/2.7.0/gems/pdf-reader-2.8.0/lib/pdf/reader/page_state.rb:152:in `move_text_position_and_set_leading'
/var/lib/gems/2.7.0/gems/pdf-reader-2.8.0/lib/pdf/reader/page_state.rb:157:in `set_text_matrix_and_text_line_matrix'
/var/lib/gems/2.7.0/gems/pdf-reader-2.8.0/lib/pdf/reader/page_state.rb:167:in `move_to_start_of_next_line'
/var/lib/gems/2.7.0/gems/pdf-reader-2.8.0/lib/pdf/reader/page_state.rb:197:in `[]'
/var/lib/gems/2.7.0/gems/pdf-reader-2.8.0/lib/pdf/reader/page_state.rb:197:in `hash'
/var/lib/gems/2.7.0/gems/pdf-reader-2.8.0/lib/pdf/reader/page_state.rb:241:in `+'
/var/lib/gems/2.7.0/gems/pdf-reader-2.8.0/lib/pdf/reader/page_state.rb:251:in `each'
/var/lib/gems/2.7.0/gems/pdf-reader-2.8.0/lib/pdf/reader/page_state.rb:329:in `process_glyph_displacement'
/var/lib/gems/2.7.0/gems/pdf-reader-2.8.0/lib/pdf/reader/page_state.rb:356:in `*'
/var/lib/gems/2.7.0/gems/pdf-reader-2.8.0/lib/pdf/reader/page_state.rb:356:in `text_rendering_matrix'
/var/lib/gems/2.7.0/gems/pdf-reader-2.8.0/lib/pdf/reader/page_state.rb:45:in `save_graphics_state'
/var/lib/gems/2.7.0/gems/pdf-reader-2.8.0/lib/pdf/reader/page_state.rb:51:in `restore_graphics_state'
/var/lib/gems/2.7.0/gems/pdf-reader-2.8.0/lib/pdf/reader/page_state.rb:66:in `concatenate_matrix'
/var/lib/gems/2.7.0/gems/pdf-reader-2.8.0/lib/pdf/reader/page_state.rb:84:in `begin_text_object'
/var/lib/gems/2.7.0/gems/pdf-reader-2.8.0/lib/pdf/reader/page_state.rb:90:in `end_text_object'
/var/lib/gems/2.7.0/gems/pdf-reader-2.8.0/lib/pdf/reader/page_state.rb:98:in `set_character_spacing'
/var/lib/gems/2.7.0/gems/pdf-reader-2.8.0/lib/pdf/reader/page_text_receiver.rb:101:in `set_spacing_next_line_show_text'
/var/lib/gems/2.7.0/gems/pdf-reader-2.8.0/lib/pdf/reader/page_text_receiver.rb:110:in `invoke_xobject'
/var/lib/gems/2.7.0/gems/pdf-reader-2.8.0/lib/pdf/reader/page_text_receiver.rb:86:in `show_text_with_positioning'
/var/lib/gems/2.7.0/gems/pdf-reader-2.8.0/lib/pdf/reader/page_text_receiver.rb:87:in `show_text_with_positioning'
/var/lib/gems/2.7.0/gems/pdf-reader-2.8.0/lib/pdf/reader/page_text_receiver.rb:96:in `move_to_next_line_and_show_text'
/var/lib/gems/2.7.0/gems/pdf-reader-2.8.0/lib/pdf/reader/parser.rb:137:in `pdf_name'
/var/lib/gems/2.7.0/gems/pdf-reader-2.8.0/lib/pdf/reader/reference.rb:54:in `kind_of?'
/var/lib/gems/2.7.0/gems/pdf-reader-2.8.0/lib/pdf/reader/reference.rb:64:in `hash'
/var/lib/gems/2.7.0/gems/pdf-reader-2.8.0/lib/pdf/reader/security_handler_factory.rb:58:in `standard?'
/var/lib/gems/2.7.0/gems/pdf-reader-2.8.0/lib/pdf/reader/security_handler_factory.rb:60:in `standard?'
/var/lib/gems/2.7.0/gems/pdf-reader-2.8.0/lib/pdf/reader/security_handler_factory.rb:62:in `<='
/var/lib/gems/2.7.0/gems/pdf-reader-2.8.0/lib/pdf/reader/standard_key_builder.rb:117:in `+'
/var/lib/gems/2.7.0/gems/pdf-reader-2.8.0/lib/pdf/reader/standard_key_builder.rb:120:in `auth_user_pass'
/var/lib/gems/2.7.0/gems/pdf-reader-2.8.0/lib/pdf/reader/standard_key_builder.rb:135:in `make_file_key'
/var/lib/gems/2.7.0/gems/pdf-reader-2.8.0/lib/pdf/reader/standard_key_builder.rb:67:in `pad_pass'
/var/lib/gems/2.7.0/gems/pdf-reader-2.8.0/lib/pdf/reader/transformation_matrix.rb:148:in `*'
/var/lib/gems/2.7.0/gems/pdf-reader-2.8.0/lib/pdf/reader/transformation_matrix.rb:187:in `+'
/var/lib/gems/2.7.0/gems/pdf-reader-2.8.0/lib/pdf/reader/transformation_matrix.rb:191:in `+'
/var/lib/gems/2.7.0/gems/pdf-reader-2.8.0/lib/pdf/reader/transformation_matrix.rb:191:in `faster_multiply!'
/var/lib/gems/2.7.0/gems/pdf-reader-2.8.0/lib/pdf/reader/width_calculator/type_one_or_three.rb:26:in `glyph_width'
/var/lib/gems/2.7.0/gems/pdf-reader-2.8.0/lib/pdf/reader/width_calculator/type_zero.rb:16:in `initialize'
/var/lib/gems/2.7.0/gems/pdf-reader-2.8.0/lib/pdf/reader/xref.rb:132:in `=='
/var/lib/gems/2.7.0/gems/pdf-reader-2.8.0/lib/pdf/reader/xref.rb:197:in `block in load_xref_stream'
/var/lib/gems/2.7.0/gems/pdf-reader-2.8.0/lib/pdf/reader/xref.rb:201:in `[]'
/var/lib/gems/2.7.0/gems/pdf-reader-2.8.0/lib/pdf/reader/xref.rb:202:in `[]'
/var/lib/gems/2.7.0/gems/ruby-rc4-0.1.5/lib/rc4.rb:33:in `process'
/var/lib/gems/2.7.0/gems/ttfunk-1.7.0/lib/ttfunk/reader.rb:12:in `read'

Is it possible for me to get backtraces for some of the above errros?

Here's the stack traces. crashes.zip

I can provide the crashing reproducer files if needed, but you should be able to generate them with the fuzzer.

Or how hard would it be for me to run the fuzzer myself?

I was able to use the fuzzer in #248 with minimal changes (to lazily fix UTF8 issues):

  reader.pages.each do |page|
    contents << page.fonts.to_s
    contents << page.text.to_s #.force_encoding('utf-8')
    contents << page.raw_content.to_s.force_encoding('utf-8')
  end
./tools/fuzz.rb spec/data/**.pdf

wrong number of arguments (given 24, expected 2)

I assume these ones are incorrect arg counts being passed to PageTextReceiver. That doesn't feel like a type checking issue - I think I'd probably need all the methods on that to permissively accept *args, and then only use the ones that particular operator expects.

Crashes in page_state and page_text_receiver :

$ grep "wrong number of arguments" crashes/*.trace -A 1 | fgrep -v "wrong number" | cut -d'/' -f 9- | sort -u
--
lib/pdf/reader/page_state.rb:106:in `set_text_font_and_size'
lib/pdf/reader/page_state.rb:119:in `set_text_leading'
lib/pdf/reader/page_state.rb:123:in `set_text_rendering_mode'
lib/pdf/reader/page_state.rb:131:in `set_word_spacing'
lib/pdf/reader/page_state.rb:139:in `move_text_position'
lib/pdf/reader/page_state.rb:152:in `move_text_position_and_set_leading'
lib/pdf/reader/page_state.rb:157:in `set_text_matrix_and_text_line_matrix'
lib/pdf/reader/page_state.rb:167:in `move_to_start_of_next_line'
lib/pdf/reader/page_state.rb:45:in `save_graphics_state'
lib/pdf/reader/page_state.rb:51:in `restore_graphics_state'
lib/pdf/reader/page_state.rb:66:in `concatenate_matrix'
lib/pdf/reader/page_state.rb:84:in `begin_text_object'
lib/pdf/reader/page_state.rb:90:in `end_text_object'
lib/pdf/reader/page_state.rb:98:in `set_character_spacing'
lib/pdf/reader/page_text_receiver.rb:101:in `set_spacing_next_line_show_text'
lib/pdf/reader/page_text_receiver.rb:110:in `invoke_xobject'
lib/pdf/reader/page_text_receiver.rb:86:in `show_text_with_positioning'
lib/pdf/reader/page_text_receiver.rb:96:in `move_to_next_line_and_show_text'
bcoles commented 2 years ago

For reference, here's the latest fuzz output summary on latest master.

The fuzzer is not deterministic, but it's nice to see the total number of crashes decreasing anyway. 66 unique crash locations.

user@ubuntu:~/Desktop/pdf-reader$ head -n 2 crashes/*.trace | grep ":in " | sort -u | wc -l
66
user@ubuntu:~/Desktop/pdf-reader$ head -n 2 crashes/*.trace | grep ":in " | sort -u
<internal:pack>:257:in `unpack'
/var/lib/gems/2.7.0/gems/hashery-2.1.2/lib/hashery/lru_hash.rb:138:in `has_key?'
/var/lib/gems/2.7.0/gems/pdf-reader-2.9.2/lib/pdf/reader/aes_v2_security_handler.rb:36:in `iv='
/var/lib/gems/2.7.0/gems/pdf-reader-2.9.2/lib/pdf/reader/buffer.rb:213:in `==='
/var/lib/gems/2.7.0/gems/pdf-reader-2.9.2/lib/pdf/reader/buffer.rb:214:in `==='
/var/lib/gems/2.7.0/gems/pdf-reader-2.9.2/lib/pdf/reader/buffer.rb:216:in `==='
/var/lib/gems/2.7.0/gems/pdf-reader-2.9.2/lib/pdf/reader/buffer.rb:363:in `initialize_copy'
/var/lib/gems/2.7.0/gems/pdf-reader-2.9.2/lib/pdf/reader/buffer.rb:366:in `block in prepare_regular_token'
/var/lib/gems/2.7.0/gems/pdf-reader-2.9.2/lib/pdf/reader/buffer.rb:366:in `getbyte'
/var/lib/gems/2.7.0/gems/pdf-reader-2.9.2/lib/pdf/reader/buffer.rb:370:in `block in prepare_regular_token'
/var/lib/gems/2.7.0/gems/pdf-reader-2.9.2/lib/pdf/reader/buffer.rb:377:in `==='
/var/lib/gems/2.7.0/gems/pdf-reader-2.9.2/lib/pdf/reader/cid_widths.rb:57:in `parse_second_form'
/var/lib/gems/2.7.0/gems/pdf-reader-2.9.2/lib/pdf/reader/filter/depredict.rb:128:in `png_depredict'
/var/lib/gems/2.7.0/gems/pdf-reader-2.9.2/lib/pdf/reader/filter/depredict.rb:41:in `*'
/var/lib/gems/2.7.0/gems/pdf-reader-2.9.2/lib/pdf/reader/filter/depredict.rb:69:in `*'
/var/lib/gems/2.7.0/gems/pdf-reader-2.9.2/lib/pdf/reader/font_descriptor.rb:57:in `glyph_width'
/var/lib/gems/2.7.0/gems/pdf-reader-2.9.2/lib/pdf/reader/font.rb:167:in `extract_base_info'
/var/lib/gems/2.7.0/gems/pdf-reader-2.9.2/lib/pdf/reader/key_builder_v5.rb:62:in `+'
/var/lib/gems/2.7.0/gems/pdf-reader-2.9.2/lib/pdf/reader/key_builder_v5.rb:73:in `+'
/var/lib/gems/2.7.0/gems/pdf-reader-2.9.2/lib/pdf/reader/lzw.rb:76:in `chr'
/var/lib/gems/2.7.0/gems/pdf-reader-2.9.2/lib/pdf/reader/object_cache.rb:73:in `include?'
/var/lib/gems/2.7.0/gems/pdf-reader-2.9.2/lib/pdf/reader/object_hash.rb:95:in `object'
/var/lib/gems/2.7.0/gems/pdf-reader-2.9.2/lib/pdf/reader/object_stream.rb:13:in `initialize'
/var/lib/gems/2.7.0/gems/pdf-reader-2.9.2/lib/pdf/reader/object_stream.rb:36:in `offsets'
/var/lib/gems/2.7.0/gems/pdf-reader-2.9.2/lib/pdf/reader/object_stream.rb:37:in `block in offsets'
/var/lib/gems/2.7.0/gems/pdf-reader-2.9.2/lib/pdf/reader/page_layout.rb:109:in `[]='
/var/lib/gems/2.7.0/gems/pdf-reader-2.9.2/lib/pdf/reader/page_layout.rb:40:in `round'
/var/lib/gems/2.7.0/gems/pdf-reader-2.9.2/lib/pdf/reader/page_state.rb:106:in `set_text_font_and_size'
/var/lib/gems/2.7.0/gems/pdf-reader-2.9.2/lib/pdf/reader/page_state.rb:107:in `set_text_font_and_size'
/var/lib/gems/2.7.0/gems/pdf-reader-2.9.2/lib/pdf/reader/page_state.rb:120:in `set_text_leading'
/var/lib/gems/2.7.0/gems/pdf-reader-2.9.2/lib/pdf/reader/page_state.rb:127:in `set_text_rise'
/var/lib/gems/2.7.0/gems/pdf-reader-2.9.2/lib/pdf/reader/page_state.rb:131:in `set_word_spacing'
/var/lib/gems/2.7.0/gems/pdf-reader-2.9.2/lib/pdf/reader/page_state.rb:132:in `set_word_spacing'
/var/lib/gems/2.7.0/gems/pdf-reader-2.9.2/lib/pdf/reader/page_state.rb:139:in `move_text_position'
/var/lib/gems/2.7.0/gems/pdf-reader-2.9.2/lib/pdf/reader/page_state.rb:144:in `move_text_position'
/var/lib/gems/2.7.0/gems/pdf-reader-2.9.2/lib/pdf/reader/page_state.rb:157:in `set_text_matrix_and_text_line_matrix'
/var/lib/gems/2.7.0/gems/pdf-reader-2.9.2/lib/pdf/reader/page_state.rb:197:in `[]'
/var/lib/gems/2.7.0/gems/pdf-reader-2.9.2/lib/pdf/reader/page_state.rb:197:in `hash'
/var/lib/gems/2.7.0/gems/pdf-reader-2.9.2/lib/pdf/reader/page_state.rb:247:in `current_font'
/var/lib/gems/2.7.0/gems/pdf-reader-2.9.2/lib/pdf/reader/page_state.rb:251:in `detect'
/var/lib/gems/2.7.0/gems/pdf-reader-2.9.2/lib/pdf/reader/page_state.rb:265:in `each'
/var/lib/gems/2.7.0/gems/pdf-reader-2.9.2/lib/pdf/reader/page_state.rb:356:in `*'
/var/lib/gems/2.7.0/gems/pdf-reader-2.9.2/lib/pdf/reader/page_state.rb:45:in `save_graphics_state'
/var/lib/gems/2.7.0/gems/pdf-reader-2.9.2/lib/pdf/reader/page_state.rb:51:in `restore_graphics_state'
/var/lib/gems/2.7.0/gems/pdf-reader-2.9.2/lib/pdf/reader/page_state.rb:66:in `concatenate_matrix'
/var/lib/gems/2.7.0/gems/pdf-reader-2.9.2/lib/pdf/reader/page_state.rb:67:in `concatenate_matrix'
/var/lib/gems/2.7.0/gems/pdf-reader-2.9.2/lib/pdf/reader/page_state.rb:84:in `begin_text_object'
/var/lib/gems/2.7.0/gems/pdf-reader-2.9.2/lib/pdf/reader/page_state.rb:90:in `end_text_object'
/var/lib/gems/2.7.0/gems/pdf-reader-2.9.2/lib/pdf/reader/page_text_receiver.rb:112:in `invoke_xobject'
/var/lib/gems/2.7.0/gems/pdf-reader-2.9.2/lib/pdf/reader/page_text_receiver.rb:82:in `show_text'
/var/lib/gems/2.7.0/gems/pdf-reader-2.9.2/lib/pdf/reader/page_text_receiver.rb:86:in `show_text_with_positioning'
/var/lib/gems/2.7.0/gems/pdf-reader-2.9.2/lib/pdf/reader/parser.rb:143:in `pdf_name'
/var/lib/gems/2.7.0/gems/pdf-reader-2.9.2/lib/pdf/reader/reference.rb:54:in `kind_of?'
/var/lib/gems/2.7.0/gems/pdf-reader-2.9.2/lib/pdf/reader/reference.rb:64:in `hash'
/var/lib/gems/2.7.0/gems/pdf-reader-2.9.2/lib/pdf/reader/security_handler_factory.rb:58:in `standard?'
/var/lib/gems/2.7.0/gems/pdf-reader-2.9.2/lib/pdf/reader/security_handler_factory.rb:60:in `standard?'
/var/lib/gems/2.7.0/gems/pdf-reader-2.9.2/lib/pdf/reader/standard_key_builder.rb:117:in `+'
/var/lib/gems/2.7.0/gems/pdf-reader-2.9.2/lib/pdf/reader/standard_key_builder.rb:120:in `auth_user_pass'
/var/lib/gems/2.7.0/gems/pdf-reader-2.9.2/lib/pdf/reader/standard_key_builder.rb:67:in `pad_pass'
/var/lib/gems/2.7.0/gems/pdf-reader-2.9.2/lib/pdf/reader/width_calculator/type_zero.rb:21:in `glyph_width'
/var/lib/gems/2.7.0/gems/pdf-reader-2.9.2/lib/pdf/reader/xref.rb:132:in `=='
/var/lib/gems/2.7.0/gems/pdf-reader-2.9.2/lib/pdf/reader/xref.rb:197:in `block in load_xref_stream'
/var/lib/gems/2.7.0/gems/pdf-reader-2.9.2/lib/pdf/reader/xref.rb:201:in `[]'
/var/lib/gems/2.7.0/gems/pdf-reader-2.9.2/lib/pdf/reader/xref.rb:202:in `[]'
/var/lib/gems/2.7.0/gems/ruby-rc4-0.1.5/lib/rc4.rb:33:in `process'
/var/lib/gems/2.7.0/gems/ttfunk-1.7.0/lib/ttfunk/reader.rb:12:in `read'