sfneal / PyPDF3

A utility to read and write PDFs with Python
https://pythonhosted.org/PyPDF2/
Other
72 stars 15 forks source link

pyPDF Unable to resolve IndirectObject getting pdf with empty pages #17

Open ambigus9 opened 2 years ago

ambigus9 commented 2 years ago

I trying to write PDF file to do that i using following code:

from PyPDF3 import PdfFileWriter, PdfFileReader
import boto3
s3 = boto3.resource("s3")
bucket = s3.Bucket(my_s3Bucket_on_AWS)
object = bucket.Object(my_s3_file_on_AWS)
tmp = tempfile2.NamedTemporaryFile()

inputpdf = PdfFileReader(open(tmp.name, "rb"), strict=False)
num_pages = inputpdf.getNumPages()
output = PdfFileWriter()
for i in range(num_pages):
    logger.info(f"Adding page --> {i}")
    output.addPage(inputpdf.getPage(i))

logger.info(f"Here getting UserWarning")
with open(tmp2.name, "wb") as output_stream:
    output.write(output_stream)
    output_stream.close()

Works perfect for at least 10K of PDFs, until 1 PDF that is getting following error:

UserWarning: Unable to resolve [IndirectObject: IndirectObject(7, 0)], returning NullObject instead [pdf.py:644]

UserWarning: Unable to resolve [IndirectObject: IndirectObject(9, 0)], returning NullObject instead [pdf.py:644]

UserWarning: Unable to resolve [IndirectObject: IndirectObject(10, 0)], returning NullObject instead [pdf.py:644]

UserWarning: Unable to resolve [IndirectObject: IndirectObject(13, 0)], returning NullObject instead [pdf.py:644]

UserWarning: Unable to resolve [IndirectObject: IndirectObject(16, 0)], returning NullObject instead [pdf.py:644]

UserWarning: Unable to resolve [IndirectObject: IndirectObject(20, 0)], returning NullObject instead [pdf.py:644]

UserWarning: Unable to resolve [IndirectObject: IndirectObject(24, 0)], returning NullObject instead [pdf.py:644]

UserWarning: Unable to resolve [IndirectObject: IndirectObject(29, 0)], returning NullObject instead [pdf.py:644]

Any suggestion about how to fix this?

Note: The PDF i trying to read is not empty, it have data.