smalot / pdfparser

PdfParser, a standalone PHP library, provides various tools to extract data from a PDF file.
GNU Lesser General Public License v3.0
2.38k stars 535 forks source link

pdf with background image/color gives Invalid object reference for $obj. Error #550

Open MadOMax opened 2 years ago

MadOMax commented 2 years ago

Exception: Invalid object reference for $obj. in Smalot\PdfParser\RawData\RawDataParser->getindirectObject() (line 528 in vendor/smalot/pdfparser/src/Smalot\PdfParser/RawData/RawDataParser.php) im using v1.1.0, I tried the latest version but still get the same issue.

k00ni commented 2 years ago

Thanks for reporting. I could not find the place you referenced in https://github.com/smalot/pdfparser/blob/master/src/Smalot/PdfParser/RawData/RawDataParser.php#L507. Can you check again in the latest version of RawDataParser.php and give me an exact line number?

MadOMax commented 2 years ago

Thanks @k00ni , line number is 528. This throw an error when some pdf files only. pdf files with background image/color may return this issue.

klode82 commented 2 years ago

I have the same issue, if you want a PDF that generates this error, I can post or send in private. Let me know.

k00ni commented 2 years ago

That would be helpful, if the PDF is free of charge and can be part of our test environment. In this case please attach it to this issue, so we have a secure place + link. Thanks.

klode82 commented 2 years ago

The PDF is free of charge, but it contains personal information... However, I can send to you in private.

k00ni commented 2 years ago

The PDF is free of charge, but it contains personal information... However, I can send to you in private.

That is unfortunate. Adding PDFs to related issues is the preferred way here, because it makes sure PDF are not removed in the meantime until someone finds time to work on a fix.

Please do not send it to me in private. I am just a maintainer here (e.g. collecting relevant information for issues, help organizing steps to get a new PR merged etc.) and I don't have time for bug fixing in my spare time, unfortunately.

ajgelado commented 1 year ago

I have met the same problem with a minimum blank PDF generated by PDF Factory. It raises the same exception at the same line (528 in RawDataParser.php). As it contains no personal data (in fact, it contains no data at all!) I can share it.

Blanco.pdf

k00ni commented 1 year ago

Thanks @ajgelado

edwink75 commented 1 year ago

I get the same error parsing a PDF document with a table. I cannot post the file, but I'm assuming that the error is caused by the last column in the table, that couldn't be displayed in the PDF, but the parser expects the same. The error line has slightly changed from 528 to 529

Invalid object reference for $obj. {myprojectfolder}/vendor/smalot/pdfparser/src/Smalot/PdfParser/RawData/RawDataParser.php#529

mschrading commented 11 months ago

I can confirm and have set this in vendor/smalot/pdfparser/src/Smalot/PdfParser/RawData/RawDataParser.php#528

if (2 !== \count($objRefArr)) { return []; throw new \Exception('Invalid object reference for $obj.'); }

but this is a interim solution... Is there a better solution in the meantime??

Michael

vincentIsNietBeschikbaar commented 1 week ago

I believe this error is caused by the pdf version. After i opened and saved my pdf in an pdf editor, the pdf version updated from 1.3 to 1.5 and the error was gone.