mfenniak / pyPdf

Pure-Python PDF Library; this repository is no longer maintained, please see https://github.com/knowah/PyPDF2/ insead.
https://github.com/knowah/PyPDF2/
Other
276 stars 85 forks source link

Trailing spaces and NUL characters in PDF cause failure identifying EOF #20

Closed freakboy3742 closed 2 years ago

freakboy3742 commented 13 years ago

I have a collection of PDFs that contain a line of NUL and space characters on the line after the %%EOF marker. The current technique for identifying the %%EOF fails on these PDFs because the 'while not line' check on line 704 of pdf.py (the start of the read() method on PdfFileReader) isn't sufficient to identify this line of NUL and spaces as something worth ignoring.

jimr commented 11 years ago

Works for me, would be great to see this merged.

jobo3208 commented 11 years ago

I agree. This fixes a major shortcoming of the library IMO. Can't tell you how many PDF's I've encountered with this problem.