Open mitchgthb opened 9 months ago
What PHP Version do you use?
Also, try again with PDFParser v 2.8.0-RC2. If you could provide the PDF which is causing the problem or example code instead (with faulty parameters), that would be helpful.
Im using PHP 8.1. I tried using the parser version you mentioned aswell but it's not working. I will provide the code and the pdf.
Code: $file = './taken_sprint5.pdf';
$parser = new Parser(); $pdf = $parser->parseFile($file);
$text = $pdf->getText(); echo $text;
Related to (or duplicate of) #646. There is no UTF-8 code point for a 'ti' ligature (and maybe 'tt' as well?) so Adobe is using some unique encoding to provide for them. Probably an Identity-H issue.
It seems like the parser has trouble reading tt and ti when they're in between words. I get a symbol that has a question mark instead. What can I do?