Open jsonn opened 13 years ago
It enters infinite loop for single-line text files and some other files too.
got this bug too !
Proposed patch
diff --git a/pyPdf/pdf.py b/pyPdf/pdf.py
index bf60d01..586ea81 100644
--- a/pyPdf/pdf.py
+++ b/pyPdf/pdf.py
@@ -701,7 +701,7 @@ class PdfFileReader(object):
# start at the end:
stream.seek(-1, 2)
line = ''
- while not line:
+ while not line and stream.tell():
line = self.readNextEndLine(stream)
if line[:5] != "%%EOF":
raise utils.PdfReadError, "EOF marker not found"
Without patch::
>>> import pyPdf
>>> from cStringIO import StringIO
>>> c = StringIO('')
>>> pdf = pyPdf.PdfFileReader(c)
--- Infinite loop ---
^CTraceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/tmp/pyPdf2/lib/python2.7/site-packages/pyPdf/pdf.py", line 374, in __init__
self.read(stream)
File "/tmp/pyPdf2/lib/python2.7/site-packages/pyPdf/pdf.py", line 705, in read
line = self.readNextEndLine(stream)
File "/tmp/pyPdf2/lib/python2.7/site-packages/pyPdf/pdf.py", line 870, in readNextEndLine
line = x + line
KeyboardInterrupt
With patch::
>>> import pyPdf
>>> from cStringIO import StringIO
>>> c = StringIO('')
>>> pdf = pyPdf.PdfFileReader(c)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/tmp/pyPdf2/lib/python2.7/site-packages/pyPdf/pdf.py", line 374, in __init__
self.read(stream)
File "/tmp/pyPdf2/lib/python2.7/site-packages/pyPdf/pdf.py", line 707, in read
raise utils.PdfReadError, "EOF marker not found"
pyPdf.utils.PdfReadError: EOF marker not found
Hum a better patch:
--- a/pyPdf/pdf.py
+++ b/pyPdf/pdf.py
@@ -701,7 +701,7 @@ class PdfFileReader(object):
# start at the end:
stream.seek(-1, 2)
line = ''
- while not line:
+ while not line and stream.tell():
line = self.readNextEndLine(stream)
if line[:5] != "%%EOF":
raise utils.PdfReadError, "EOF marker not found"
@@ -857,7 +857,7 @@ class PdfFileReader(object):
def readNextEndLine(self, stream):
line = ""
- while True:
+ while stream.tell():
x = stream.read(1)
stream.seek(-2, 1)
if x == '\n' or x == '\r':
This one work with empty stream but also one line stream:
>>> import pyPdf
>>> from cStringIO import StringIO
>>> c = StringIO(' ')
>>> pdf = pyPdf.PdfFileReader(c)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/tmp/pyPdf2/lib/python2.7/site-packages/pyPdf/pdf.py", line 374, in __init__
self.read(stream)
File "/tmp/pyPdf2/lib/python2.7/site-packages/pyPdf/pdf.py", line 707, in read
raise utils.PdfReadError, "EOF marker not found"
pyPdf.utils.PdfReadError: EOF marker not found
The second chunk is not really going to work...
sorry, corrected :-)
Create an empty StringIO and call the pdf reader on it. It will loop in the readNextEndLine calls before the %%EOF check in read.