smalot / pdfparser

PdfParser, a standalone PHP library, provides various tools to extract data from a PDF file.
GNU Lesser General Public License v3.0
2.3k stars 534 forks source link

Having error when trying to get details from pdf page generated with html emoji code inside pdf text #701

Open luffyfr opened 2 months ago

luffyfr commented 2 months ago

Description:

When trying to call $currentpage->getDetails(); with a pdf contains an emoji html code, I have this error : Object of class Smalot\PdfParser\Header could not be converted to string it's happening in Font.php file, when function getName return "[Unknown]";

Code

I fix this error replacing the code by : $details['Encoding'] = $this->getName() != "[Unknown]" ? ($this->has('Encoding') ? (string) $this->get('Encoding') : 'Ansi') : 'Ansi';

Is that possible to check this, Best Regards.

file-test.pdf

GreyWyvern commented 2 months ago

I don't get any error, but I also get a string of six invalid bytes rather than an emoji.

The document stream is printing character at code point 76 from font F6, but it's not being interpreted as an emoji.

BT
/P <</MCID 1 >>BDC
/F6 11.3299999 Tf
1 0 .25 -1 66.046875 647 Tm
<76> Tj
EMC
ET

Output:

Anonymous, ������ (here html emoji code), 2021
1

Probably some kind of font decoding issue.