mfenniak / pyPdf

Pure-Python PDF Library; this repository is no longer maintained, please see https://github.com/knowah/PyPDF2/ insead.
https://github.com/knowah/PyPDF2/
Other
276 stars 85 forks source link

Infinite loop on empty input #16

Open jsonn opened 13 years ago

jsonn commented 13 years ago

Create an empty StringIO and call the pdf reader on it. It will loop in the readNextEndLine calls before the %%EOF check in read.

tongwang commented 12 years ago

It enters infinite loop for single-line text files and some other files too.

alexgarel commented 12 years ago

got this bug too !

alexgarel commented 12 years ago

Proposed patch

diff --git a/pyPdf/pdf.py b/pyPdf/pdf.py
index bf60d01..586ea81 100644
--- a/pyPdf/pdf.py
+++ b/pyPdf/pdf.py
@@ -701,7 +701,7 @@ class PdfFileReader(object):
         # start at the end:
         stream.seek(-1, 2)
         line = ''
-        while not line:
+        while not line and stream.tell():
             line = self.readNextEndLine(stream)
         if line[:5] != "%%EOF":
             raise utils.PdfReadError, "EOF marker not found"

Without patch::

    >>> import pyPdf
    >>> from cStringIO import StringIO
    >>> c = StringIO('')
    >>> pdf = pyPdf.PdfFileReader(c)
    --- Infinite loop ---
    ^CTraceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/tmp/pyPdf2/lib/python2.7/site-packages/pyPdf/pdf.py", line 374, in __init__
        self.read(stream)
      File "/tmp/pyPdf2/lib/python2.7/site-packages/pyPdf/pdf.py", line 705, in read
        line = self.readNextEndLine(stream)
      File "/tmp/pyPdf2/lib/python2.7/site-packages/pyPdf/pdf.py", line 870, in readNextEndLine
        line = x + line
    KeyboardInterrupt

With patch::

    >>> import pyPdf
    >>> from cStringIO import StringIO
    >>> c = StringIO('')
    >>> pdf = pyPdf.PdfFileReader(c)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/tmp/pyPdf2/lib/python2.7/site-packages/pyPdf/pdf.py", line 374, in __init__
        self.read(stream)
      File "/tmp/pyPdf2/lib/python2.7/site-packages/pyPdf/pdf.py", line 707, in read
        raise utils.PdfReadError, "EOF marker not found"
    pyPdf.utils.PdfReadError: EOF marker not found
alexgarel commented 12 years ago

Hum a better patch:

--- a/pyPdf/pdf.py
+++ b/pyPdf/pdf.py
@@ -701,7 +701,7 @@ class PdfFileReader(object):
         # start at the end:
         stream.seek(-1, 2)
         line = ''
-        while not line:
+        while not line and stream.tell():
             line = self.readNextEndLine(stream)
         if line[:5] != "%%EOF":
             raise utils.PdfReadError, "EOF marker not found"
@@ -857,7 +857,7 @@ class PdfFileReader(object):

     def readNextEndLine(self, stream):
         line = ""
-        while True:
+        while stream.tell():
             x = stream.read(1)
             stream.seek(-2, 1)
             if x == '\n' or x == '\r':

This one work with empty stream but also one line stream:

>>> import pyPdf
>>> from cStringIO import StringIO
>>> c = StringIO('  ')
>>> pdf = pyPdf.PdfFileReader(c)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/tmp/pyPdf2/lib/python2.7/site-packages/pyPdf/pdf.py", line 374, in __init__
    self.read(stream)
  File "/tmp/pyPdf2/lib/python2.7/site-packages/pyPdf/pdf.py", line 707, in read
    raise utils.PdfReadError, "EOF marker not found"
pyPdf.utils.PdfReadError: EOF marker not found
jsonn commented 12 years ago

The second chunk is not really going to work...

alexgarel commented 12 years ago

sorry, corrected :-)