yob / pdf-reader

The PDF::Reader library implements a PDF parser conforming as much as possible to the PDF specification from Adobe.
MIT License
1.82k stars 271 forks source link

Guard against potential type errors when walking a content stream #437

Closed yob closed 2 years ago

yob commented 2 years ago

Page#walk will execute the content stream of a page, calling methods on a receiver class provided by the user. Each operator has a specific set of parameters it expects, and we wrap the users receiver class in this one to verify the PDF uses valid parameters.

Without these checks, users can't be confident about the number of parameters they'll receive for an operator, or what the type of those parameters will be. Everyone ends up building their own type safety guard clauses and it's tedious.

Not all operators have type safety implemented yet, but we can expand the number over time.

This fixes a large number of potential crashes exposed by the fuzzed added in #429. When it flips bits in some content streams the parameters that are passed to receivers can change in number of type, raising exceptions we don't want to raise (like NoMethodError). On the one hand this feels a bit non-rubyish, but on the other hand Page#walk is specifically about parsing a program that we don't control, and calling out to code we don't control. The least we can do is provide some guarantees to the receiver that the program we're executing meets the standard.