mfenniak / pyPdf

Pure-Python PDF Library; this repository is no longer maintained, please see https://github.com/knowah/PyPDF2/ insead.
https://github.com/knowah/PyPDF2/
Other
276 stars 85 forks source link

problem in NameObject.readFromStream when stream.read(1) does not advance #6

Open ccurvey opened 13 years ago

ccurvey commented 13 years ago

I'm in way over my head here...kind of feel like the blind pig that found an acorn. Anyway, I'm trying to process a PDF that contains the following items:

10 0 obj /DeviceGray endobj

The problem is that when the line "/DeviceGray" is read, tok = stream.read(1) does not seem to advance the file pointer. (I checked by looking at the value of stream.tell() before and after the stream.read())

I don't know why the pointer does not get advanced, but making the code look like this fixes the problem, and things seem to move along just fine.

    while True:
        pre_read = stream.tell() # new
        tok = stream.read(1)
        if tok.isspace() or tok in NameObject.delimiterCharacters or stream.tell() == pre_read:
            stream.seek(-1, 1)
            break
        name += tok
    return NameObject(name)

I can provide a copy of the PDF to someone if they want an example. (Note to self: this is 98421_SupLegal 2008-02 Stmt_p83_r8.pdf)

ccurvey commented 13 years ago

woops...I just noticed that in the regular post, the PDF formatting got messed up. The "endobj" is on the next line from "/DeviceGray" (at least in vim with the PDF plugin). That might explain the problem.