madmaze / pytesseract

A Python wrapper for Google Tesseract
Apache License 2.0
5.84k stars 721 forks source link

raise TesseractError(proc.returncode, get_errors(error_string)) pytesseract.pytesseract.TesseractError: (3221225477, '') #455

Closed me-suzy closed 1 year ago

me-suzy commented 2 years ago
File "C:\Users\Castel\AppData\Roaming\Python\Python310\site-packages\pytesseract\pytesseract.py", line 264, in run_tesseract
    raise TesseractError(proc.returncode, get_errors(error_string))
pytesseract.pytesseract.TesseractError: (3221225477, '')

Print screen:

image

stefan6419846 commented 2 years ago

Which version of Tesseract are you using? Does it work if you call Tesseract directly with the same arguments (see my comment here: https://github.com/madmaze/pytesseract/issues/105#issuecomment-1272273206)? Judging from the recent comment here, it seems like you have already found https://stackoverflow.com/questions/51439138/ and tried the other workarounds there to no avail?

me-suzy commented 2 years ago

I use these, the same error on all of them:

tesseract-ocr-w64-setup-v5.0.0-alpha.20210506 tesseract-ocr-w64-setup-v5.0.0-alpha.20200223 tesseract-ocr-w64-setup-v5.2.0.20220712 pytesseract-master.zip

stefan6419846 commented 2 years ago

Does this happen for all your files? Or is this limited to some of them? Do you have a public reproducer which you can share? Which pre-processing do you apply, as Tesseract usually will not handle PDF files?

Nevertheless, I suspect that this is not really related to pytesseract itself.

me-suzy commented 2 years ago

the problem is that on another laptop, the code is working fine, for the same files...

I don't know the problem is only on my laptop, on my code, on my libraries...

me-suzy commented 2 years ago

Ok, I know what is the problem.

The problem is the newest version of poppler librafry

path_to_poppler_exe = Path(r"c:\Program Files\poppler-22.04.0\Library\bin")

If I change to another version, such as poppler-0.68.0 it is working.

stefan6419846 commented 2 years ago

So this in no way related to pytesseract and most likely not even to Tesseract itself, as poppler probably is part of your undisclosed preprocessing.

With the current findings, you should be able to close this issue.

thisismygitrepo commented 2 years ago

I doubt that poppler is the culprit. I'm running into the same issue. It works like a dream when I run it from commandline (.exe), but when I write python code (right from examples of README), it throws the error above. No change in poppler installation.

stefan6419846 commented 2 years ago

poppler can be the culprit, but does not need to. By default Tesseract and pytesseract do not use poppler as far as I am aware, so just changing the Poppler version will not help much. In the aforementioned case, it seems like the preprocessing involved converting a PDF file to images for Tesseract processing (see the log), which appears to involve Poppler, maybe through pdf2image.

Could you please elaborate a bit more on the versions you are using, especially of Python, Tesseract, pytesseract and Pillow? Do you have a public reproducer to test this with?