When I tried to get the total pages of "test.pdf" using PdfReader, it said 2 pages, but that pdf file actually has 19 pages.
So I tried again with PdfFileReader from PyPDF2, it worked fine.
I don't know why PdfReader doesn't work properly, but I'm trying to use preexisting stream while initializing PdfReader as mentioned in the source code.
# Allow reading preexisting streams like pyPdf
if hasattr(fname, 'read'):
fdata = fname.read()
else:
try:
f = open(fname, 'rb')
fdata = f.read()
f.close()
But it also failed because both PdfFileReader classes in pyPdf and pyPDF2 need stream argument as below.
>>> pdf_reader2 = PdfReader(pdf_file_reader)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/data/pdf_test/venv/lib/python3.7/site-packages/pdfrw/pdfreader.py", line 565, in __init__
fdata = fname.read()
TypeError: read() missing 1 required positional argument: 'stream'
# pyPdf
def read(self, stream):
# start at the end:
stream.seek(-1, 2)
# pyPDF2
def read(self, stream):
debug = False
if debug: print(">>read", stream)
# start at the end:
Could you update your source code to work properly with those streams?
Also, I'm adding that "test.pdf" for you to examine what's wrong with the page number.
When I tried to get the total pages of "test.pdf" using PdfReader, it said 2 pages, but that pdf file actually has 19 pages. So I tried again with PdfFileReader from PyPDF2, it worked fine.
I don't know why PdfReader doesn't work properly, but I'm trying to use preexisting stream while initializing PdfReader as mentioned in the source code.
But it also failed because both PdfFileReader classes in pyPdf and pyPDF2 need stream argument as below.
Could you update your source code to work properly with those streams? Also, I'm adding that "test.pdf" for you to examine what's wrong with the page number.
test.pdf