madmaze / pytesseract

A Python wrapper for Google Tesseract
Apache License 2.0
5.82k stars 721 forks source link

Unsupported image object when using numpy.ndarray image #521

Closed baudneo closed 1 year ago

baudneo commented 1 year ago

My only guess is that somehow pytesseract is trying to import numpy from a diff env and the pkgutil.from_loader command sets numpy_installed to False? I have pytesseract installed into a venv.

Logs

  '10/14/23 11:34:46.0690' ZoMi:API[183741] DEBUG ocr:243 -> image for tesseract: type(letter) =<class 'numpy.ndarray'> -- isinstance(letter, np.ndarray) =True
  '10/14/23 11:34:46.0691' ZoMi:API[183741] ERROR ocr:251 -> PYTESSERACT OCR ERROR: Unsupported image object

Code

 if pytesseract is not None:
                ts = time.perf_counter()
                # first image should be the whole plate, the rest should be extracted letters
                # fixme: tesseract cli gets proper results on index 1+ but here it does not.
                psm = 7
                try:
                    for idx, letter in enumerate(letters):
                        if idx == 1:
                            psm = 10
                        logger.debug(f"image for tesseract: {type(letter) =} -- {isinstance(letter, np.ndarray) =}")
                        tess = pytesseract.image_to_string(
                            letter, lang="eng", config=f"--psm {psm}"
                        )
                        ret_[
                            f"Tesseract_{idx if idx > 0 else 'FULL IMAGE'}"
                        ] = f"PSM={psm} ---> {tess}"
                except Exception as exc:
                    logger.error(f"PYTESSERACT OCR ERROR: {exc}")
                else:
                    logger.debug(
                        f"perf:{self.LP} PyTesseract took {time.perf_counter() - ts:.5f} seconds"
                    )

Pytesseract Code

from pkgutil import find_loader

numpy_installed = find_loader('numpy') is not None
if numpy_installed:
    from numpy import ndarray

def prepare(image):
    if numpy_installed and isinstance(image, ndarray):
        image = Image.fromarray(image)

    if not isinstance(image, Image.Image):
        raise TypeError('Unsupported image object')
stefan6419846 commented 1 year ago

Did you check that numpy is available in the same environment as pytesseract? What does pip show numpy pytesseract yield in the current environment? Which Python version are you using?

baudneo commented 1 year ago

Hi, I nuked the venv and did a reinstallation of a new venv and everything is working as expected. Sorry about that, not sure what the issue was, but all is well now.

Thanks!