Strange behaviour parsing PDF File

Sorry I didn't get a around to looking into this in 2017 😞

I just had a proper look and confirmed this issue is still happening in v2.8.0, and that evince can extract the text correctly. It's surprising because the file metadata claims it was created by prawn, and usually pdf-reader can handle prawn generated files just fine.

The root issue appears to be this conditional: https://github.com/yob/pdf-reader/blob/951f9c2659ce3b25c7731d79d54a2ce4ae3bc8e4/lib/pdf/reader/font.rb#L54-L60

The fonts in this file have ToUnicode cmaps so we defer all unicode conversion to them. However, the CMaps only have a handful of mappings defined in them. I'm not sure if the CMaps should have some default mappings in them, or maybe we should be falling back to the encoding dict for glyphs not explicitly listed in the CMap 🤔

yob / pdf-reader

Strange behaviour parsing PDF File #216