Closed me-suzy closed 1 year ago
Which version of Tesseract are you using? Does it work if you call Tesseract directly with the same arguments (see my comment here: https://github.com/madmaze/pytesseract/issues/105#issuecomment-1272273206)? Judging from the recent comment here, it seems like you have already found https://stackoverflow.com/questions/51439138/ and tried the other workarounds there to no avail?
I use these, the same error on all of them:
tesseract-ocr-w64-setup-v5.0.0-alpha.20210506 tesseract-ocr-w64-setup-v5.0.0-alpha.20200223 tesseract-ocr-w64-setup-v5.2.0.20220712 pytesseract-master.zip
Does this happen for all your files? Or is this limited to some of them? Do you have a public reproducer which you can share? Which pre-processing do you apply, as Tesseract usually will not handle PDF files?
Nevertheless, I suspect that this is not really related to pytesseract itself.
the problem is that on another laptop, the code is working fine, for the same files...
I don't know the problem is only on my laptop, on my code, on my libraries...
Ok, I know what is the problem.
The problem is the newest version of poppler librafry
path_to_poppler_exe = Path(r"c:\Program Files\poppler-22.04.0\Library\bin")
If I change to another version, such as poppler-0.68.0
it is working.
So this in no way related to pytesseract and most likely not even to Tesseract itself, as poppler probably is part of your undisclosed preprocessing.
With the current findings, you should be able to close this issue.
I doubt that poppler
is the culprit. I'm running into the same issue. It works like a dream when I run it from commandline (.exe
), but when I write python code (right from examples of README
), it throws the error above.
No change in poppler
installation.
poppler can be the culprit, but does not need to. By default Tesseract and pytesseract do not use poppler as far as I am aware, so just changing the Poppler version will not help much. In the aforementioned case, it seems like the preprocessing involved converting a PDF file to images for Tesseract processing (see the log), which appears to involve Poppler, maybe through pdf2image.
Could you please elaborate a bit more on the versions you are using, especially of Python, Tesseract, pytesseract and Pillow? Do you have a public reproducer to test this with?
Print screen: