smalot / pdfparser

PdfParser, a standalone PHP library, provides various tools to extract data from a PDF file.
GNU Lesser General Public License v3.0
2.3k stars 534 forks source link

Invalid object reference for $obj. #649

Closed mschrading closed 8 months ago

mschrading commented 8 months ago

smalot/pdfparser (v2.7.0)

Description

I am writing to you today because I have a problem with your pdfParser on the live server. I get an exception: "Invalid object reference for $obj." in getIndirectObject on line 521:

protected function getIndirectObject(string $pdfData, array $xref, string $objRef, int $offset = 0, bool $decoding = true): array
     {
         /*
          * build indirect object header
          */
         // $objHeader = "[object number] [generation number] obj"
         $objRefArr = explode('_', $objRef);
         if (2 !== count($objRefArr)) {
             throw new \Exception('Invalid object reference for $obj.');
         }

I'm using cakephp 4.4

How can I avoid this? Better is not to parse this pdf and maybe get an error list and return an empty array, or something?

Can you please help me?

Parsing will always run in an exception on RawDataParser.php in methode getIndirectObject

if (2 !== \count($objRefArr)) {
    throw new \Exception('Invalid object reference for $obj.');
}

PDF input

Expected output & actual output

exception Invalid object reference for $obj.

Code

k00ni commented 8 months ago

Duplicate of #550. Further notes there.