sirfz / tesserocr

A Python wrapper for the tesseract-ocr API
MIT License
2.02k stars 254 forks source link

tesserocr.tesseract_version() Missing Libaries #336

Closed bananarama456 closed 7 months ago

bananarama456 commented 1 year ago

When I run the tesserocr.tesseract_version() command in my container, I only get "tesseract 5.3.3\n leptonica-1.83.1\n libpng 1.6.34 : zlib 1.2.11", meaning I am unable to run tessocr against jpeg, gif, tiff files etc.

My dockerfile looks like this:

# Use an official Python runtime as a parent image
FROM python:3.10
# Set environment variables for Python
ENV PYTHONUNBUFFERED 1
# Create and set the working directory
WORKDIR /tess_service
# Run dependencies for cv2 & pytesseract
RUN apt-get update \
    && apt-get install tesseract-ocr libtesseract-dev libleptonica-dev pkg-config -y

ENV TESSDATA_PREFIX=/usr/share/tesseract-ocr/5/tessdata
# Copy the requirements file into the container
COPY requirements.txt /tess_service/requirements.txt
# Install any needed packages specified in requirements.txt
RUN pip install -r requirements.txt
# Copy the Django project files into the container
COPY . /tess_service/
# Expose the port your Django app will run on
EXPOSE 1999
CMD ["bash", "-c", "python3 manage.py runserver 0.0.0.0:1999"]

And am just doing a pip install tessrocr in my requirements.txt.

When I look at my container image I see that it has downloaded libgif, libjpeg, libtiff ,libopenjp2 etc but tessrocr cannot find them when initializing the model.

Any help would be greatly appreciated. Thanks :)

sirfz commented 8 months ago

You're installing a tesserocr binary wheel which only supports PNG. I just updated the GA workflow to build binaries with jpeg/tiff/webp support. You can try out the wheels from https://github.com/sirfz/tesserocr/actions/runs/8189850537

If you wish to build tesserocr yourself, install as follows:

pip install --no-binary tesserocr tesserocr

Of course, you need all requirements to be already installed

sirfz commented 7 months ago

tesserocr v2.6.3 binaries are now built with jpeg (as well as tiff and webp) support