The issue with this is that tesseract and PIL both support JPEG2000 format images, so pytesseract should support the union of their behavior.
Creating a JPEG2000 image
I tried to attach a JPEG2000 image, but GitHub doesn't like that so I've attached a PNG instead with the code to create one.
import io
from PIL import Image
with open('example.png', 'rb') as f:
image_data = f.read()
buffer = io.BytesIO(image_data)
image = Image.open(buffer)
image.save("example.jp2", "JPEG2000")
Solution
Adding JPEG2000 to SUPPORTED_FORMATS fixes the issues and returns the expected OCR results. This is because pillow uses JPEG2000 for image.format internally, and it passes the type check during pytesseract preparing.
Problem
When a JPEG2000 image is loaded with
pillow
and run usingpytesseract
,An exception is raised:
TypeError: Unsupported image format/type
becauseJPEG2000
is not inSUPPORTED_FORMATS
dictionary inpytesseract
:The issue with this is that tesseract and PIL both support JPEG2000 format images, so pytesseract should support the union of their behavior.
Creating a JPEG2000 image
I tried to attach a JPEG2000 image, but GitHub doesn't like that so I've attached a PNG instead with the code to create one.
Solution
Adding
JPEG2000
toSUPPORTED_FORMATS
fixes the issues and returns the expected OCR results. This is becausepillow
usesJPEG2000
forimage.format
internally, and it passes the type check duringpytesseract
preparing.