yob / pdf-reader

The PDF::Reader library implements a PDF parser conforming as much as possible to the PDF specification from Adobe.
MIT License
1.82k stars 271 forks source link

Avoid NoMethodError on files with a OneByteIdentityH font #313

Closed yob closed 5 years ago

yob commented 5 years ago

I'm not 100% sure how a font with a CMap based encoding is supposed to be interpreted. For the sample file I have (with a CMap called "OneByteIdentityH", it "works" if I assume the font uses StandardEncoding.

The CMap encoding section of the spec (9.7.5) is quite detailed though, so presumably assuming StandardEncoding is incorrect.

Still, I'd rather get some text extraction wrong than raise a NoMethodError. As I collect more samples of files with CMap based encodings, we can make the text extraction logic for them more robust.

Closes #279