maxpmaxp / pdfreader

Python API for PDF documents
MIT License
113 stars 26 forks source link

Failure to extract image as Pillow image ("Not enough image data") #98

Open lisch opened 2 years ago

lisch commented 2 years ago

I can't upload the python script as a .py file, so I tacked on a .txt extension. Running the script as follows produces the traceback shown below when running with the indicated file.

maxpmaxp commented 2 years ago

@lisch The image is LZW-encoded and LZW decoder fails here https://github.com/maxpmaxp/pdfreader/blob/30818a2083b22624310fa83eb0101aefea60741c/pdfreader/filters/lzw.py#L227

Need to add support for END_OF_INFO_CODE symbol. Feel free to contribute :)