pdfminer / pdfminer.six

Community maintained fork of pdfminer - we fathom PDF
https://pdfminersix.readthedocs.io
MIT License
5.96k stars 930 forks source link

Support zipped jpegs #937

Closed pietermarsman closed 10 months ago

pietermarsman commented 10 months ago

Bug report

In #906 an pdf is added that contains an image that is both zip and jpeg encoded. The current code export this as a bmp, it is not recognized as a jpeg.

PYTHONPATH=. python tools/pdf2txt.py samples/contrib/issue_495_pdfobjref.pdf --output-dir images

Outputs a Xop2.bmp in the images directory. This should be a jpeg because it has a FlateDecode and DCTDecode filter. So it is a zipped jpeg.