yob / pdf-reader

The PDF::Reader library implements a PDF parser conforming as much as possible to the PDF specification from Adobe.
MIT License
1.81k stars 271 forks source link

PDF::Reader:WidthCalculator::TypeZero: type zero fonts with no descendants: undefined method `first' for nil:NilClass #453

Closed bcoles closed 2 years ago

bcoles commented 2 years ago

A zero type font (TypeZero) with no descendants is invalid. It is expected to be invalid, and unsurprising that this crashes during initialization.

https://github.com/yob/pdf-reader/blob/a0e604ee893d4c09b88abe3a35a375fd160854cd/lib/pdf/reader/width_calculator/type_zero.rb#L14-L17

20220417003536836678514_crash_576.pdf

crashes/20220417003536836678514_crash_576.pdf.trace-1-undefined method `first' for nil:NilClass
crashes/20220417003536836678514_crash_576.pdf.trace:2:/var/lib/gems/2.7.0/gems/pdf-reader-2.9.2/lib/pdf/reader/width_calculator/type_zero.rb:16:in `initialize'

Perhaps pdf-reader could (should?) handle font issues gracefully by raising custom PDF::Reader::Font* errors - in general, not just this instance - and optionally falling back to a default font or simply not rendering it. I'm not sure what the standard behaviour is for other PDF readers, but invalid fonts are not a fatal error.

yob commented 2 years ago

Perhaps pdf-reader could (should?) handle font issues gracefully by raising custom PDF::Reader::Font* errors - in general, not just this instance - and optionally falling back to a default font or simply not rendering it. I'm not sure what the standard behaviour is for other PDF readers, but invalid fonts are not a fatal error.

This is a really good question. I've tended to be strict about raising a MalformedPDFError when the PDF file breaks the spec when the fallback behaviour isn't obvious.

I've fixed the nil safety bug in the width calculator in #456, because I like those width calculators being quiet dumb. At a minimum I could see maaaaaybe raising a documented exception (like MalformedPDFError) in PDF::Reader::Font when we encounter a Type0 font with no descendant. Maybe falling back to something non-fatal is worth considering though, I'll mull it over.