Open ondrejbartas opened 7 years ago
Sorry I didn't get a around to looking into this in 2017 😞
I just had a proper look and confirmed this issue is still happening in v2.8.0, and that evince can extract the text correctly. It's surprising because the file metadata claims it was created by prawn, and usually pdf-reader can handle prawn generated files just fine.
The root issue appears to be this conditional: https://github.com/yob/pdf-reader/blob/951f9c2659ce3b25c7731d79d54a2ce4ae3bc8e4/lib/pdf/reader/font.rb#L54-L60
The fonts in this file have ToUnicode
cmaps so we defer all unicode conversion to them. However, the CMaps only have a handful of mappings defined in them. I'm not sure if the CMaps should have some default mappings in them, or maybe we should be falling back to the encoding dict for glyphs not explicitly listed in the CMap 🤔
Hi,
I have this wierd error:
And I am getting this result by
x =File.open('~/billapp.pdf', 'rb')
I am adding that PDF here billapp.pdf
With other PDFs it is working fine but with this one not :(