Closed SharmileeS closed 10 years ago
Hello,
This warning means that the first section of the xref table does not begin with object zero. There may have been an error in writing the PDF. If strict = False
, PyPDF2 will try to correct the object ID numbers. If strict = True
, they will not be corrected. Usually this is not fatal, and Adobe can read the PDF and the PyPDF2 output. Was there an issue you were having with this warning or did you just want to know what it meant?
Just wanted to know what it meant and is there anything which can be done to avoid this warning. Thanks a lot.
I am receiving the same error using the PdfFileMerger() tool. It appears to be only some pdfs. Any thoughts?
Some improvements were recently added to the algorithm for reading Xrefs (a day or so ago), but if you're still getting this warning, it's generally not an issue. Your output PDFs should still appear as expected - if they don't, then that is an issue.
Usually if non zero-indexed Xrefs actually pose a problem, you will get an actual exception. If all you get is a warning, you likely have nothing to worry about.
The update you referred to in pdy.py worked! Thanks for a handy tool!
Disabling strict mode did not help me.
Repairing the document with Ghostscript did the job. Thread here
You said:“Usually this is not fatal”,but it seems that something is wrong...
C:\Users\Helloworld\Desktop\pdf>ipython
Python 3.6.2 (v3.6.2:5fd33b5, Jul 8 2017, 04:57:36) [MSC v.1900 64 bit (AMD64)]
Type 'copyright', 'credits' or 'license' for more information
IPython 6.2.1 -- An enhanced Interactive Python. Type '?' for help.
In [1]: import PyPDF2
In [2]: pdf_fileobj = open("afile.pdf", "rb")
In [3]: pdf_reader = PyPDF2.PdfFileReader(pdf_fileobj)
PdfReadWarning: Xref table not zero-indexed. ID numbers for objects will be corrected. [pdf.py:1736]
In [4]: pdf_reader.numPages
---------------------------------------------------------------------------
PdfReadError Traceback (most recent call last)
<ipython-input-4-29a8f66e8824> in <module>()
----> 1 pdf_reader.numPages
c:\program files\python36\lib\site-packages\PyPDF2\pdf.py in <lambda>(self)
1156 return len(self.flattenedPages)
1157
-> 1158 numPages = property(lambda self: self.getNumPages(), None, None)
1159 """
1160 Read-only property that accesses the
c:\program files\python36\lib\site-packages\PyPDF2\pdf.py in getNumPages(self)
1153 else:
1154 if self.flattenedPages == None:
-> 1155 self._flatten()
1156 return len(self.flattenedPages)
1157
c:\program files\python36\lib\site-packages\PyPDF2\pdf.py in _flatten(self, pages, inherit, indirectRef)
1503 if pages == None:
1504 self.flattenedPages = []
-> 1505 catalog = self.trailer["/Root"].getObject()
1506 pages = catalog["/Pages"].getObject()
1507
c:\program files\python36\lib\site-packages\PyPDF2\generic.py in __getitem__(self, key)
514
515 def __getitem__(self, key):
--> 516 return dict.__getitem__(self, key).getObject()
517
518 ##
c:\program files\python36\lib\site-packages\PyPDF2\generic.py in getObject(self)
176
177 def getObject(self):
--> 178 return self.pdf.getObject(self).getObject()
179
180 def __repr__(self):
c:\program files\python36\lib\site-packages\PyPDF2\pdf.py in getObject(self, indirectReference)
1602 if self.strict:
1603 raise utils.PdfReadError("Expected object ID (%d %d) does not match actual (%d %d); xref table not zero-indexed." \
-> 1604 % (indirectReference.idnum, indirectReference.generation, idnum, generation))
1605 else: pass # xref table is corrected in non-strict mode
1606 elif idnum != indirectReference.idnum:
PdfReadError: Expected object ID (8 0) does not match actual (7 0); xref table not zero-indexed.
In [5]: pdf_reader.getNumPages()
Out[5]: 0
Actually, "afile.pdf" has 2 pages. I don't know why raise PdfReadError?
I meet this issue and solved like pdfin = PdfFileReader(open('yqsq.pdf', 'rb'), strict=False)
I also found that if the name of the pdf document is changed manually, issue comes.
i have set strict=False like pdfin = PdfFileReader(open('yqsq.pdf', 'rb'), strict=False) , still i am not getting file content.
PdfReadWarning: Superfluous whitespace found in object header b'12' b'0' [ pdf.py: 1665 ] Can anyone solved this error ?
I solved my same issue by PdfFileReader(open('yqsq.pdf', 'rb'), strict=False) and also deleted space in my pdf file name
@Florakirie, prefer to use PdfReader instead of PdfFileReader which is obsolescent. PdfReader assert strict=False as default value.
I recommend to use PdfReader("yqsq.pdf")
. It's simpler to read and you don't have dangling open file pointers.
PdfReadWarning: Xref table not zero-indexed. ID numbers for objects will not be corrected. [pdf.py:1130]