I am using PyPDF3 to extract metadata from a bunch of PDF files I have on my drive. It is working pretty well, but I am running into an issue that it keeps outputting stuff to STDOUT like below:
invalid pdf header: b'Comun'
incorrect startxref pointer(3)
I understand that the error is raised because there are errors in the PDF file that is being parsed, and that is fine.
What I've tried:
Passing strict=False to PdfReader object at construction time. According to PdfReader docs it is already False by default, but I thought it couldn't hurt.
Setting the logging levels for the PyPDF2 logger as explained in the documentation.
None of the two things worked, so I'm a bit at a loss of how to stop these errors (or log them to a different place).
Does someone have a way how to do this that works? Thanks!
Hi,
I am using PyPDF3 to extract metadata from a bunch of PDF files I have on my drive. It is working pretty well, but I am running into an issue that it keeps outputting stuff to STDOUT like below:
I understand that the error is raised because there are errors in the PDF file that is being parsed, and that is fine.
What I've tried:
strict=False
toPdfReader
object at construction time. According to PdfReader docs it is alreadyFalse
by default, but I thought it couldn't hurt.PyPDF2
logger as explained in the documentation.None of the two things worked, so I'm a bit at a loss of how to stop these errors (or log them to a different place).
Does someone have a way how to do this that works? Thanks!