smalot / pdfparser

PdfParser, a standalone PHP library, provides various tools to extract data from a PDF file.
GNU Lesser General Public License v3.0
2.42k stars 538 forks source link

Local Parser behaves different to the online demo; several characters remain encoded #425

Closed Externaluse closed 3 years ago

Externaluse commented 3 years ago

Whilst trying to extract some text from a PDF document, I get a result like this: designated point defined b\x0\x0\x3\x0U\x0H\x0I\x0H\x0U\x0H\x0Q\x0F\x0H\x0 \nto navigation aids, whereas the online demo on pdfparser.org extracts this readable to designated point defined by reference to navigation aids,

The document is attached: 2020Pruefungsfragen_AZF_pdf-1.pdf

What might be a reason that text is extracted correctly for the most part, but that hundreds of such occasions are left intact? The PHP version is 7.4.16

Externaluse commented 3 years ago

Ah, damn, entirely my mistake. My composer.json had reverted to a previous' project parser version of 0.15 - Version 1.0 is working fine. Apologies!

k00ni commented 3 years ago

No problem, happens to the best.