Open staff0rd opened 5 months ago
Did you find a workaround for this?
the fonts in the PDF have no tounicode mapping which is the standard way to get translation for text extraction. without such information pypdf uses the codes. Personally, I've not been able yet to identify a way to get a unicode from the font
The below code results in what looks like a bunch of hexadecimal. The first page of the pdf is displayed below, I note that I can copy/paste text normally from it (via Google Chrome).
Environment
Which environment were you using when you encountered the problem?
Code + PDF
This is a minimal, complete example that shows the issue:
Share here the PDF file(s) that cause the issue: kia-stonic-owners-manual-my23.pdf
First page of pdf
top of
text.txt