Closed moloyc closed 1 year ago
Any chance you can get the windows builds up again? Sirfz recently published detailed build instrunctions, which might help.
@moloyc @dynobo should be done now you can close the issue
@moloyc @dynobo should be done now you can close the issue
GetUTF8Text() crashes for me (tesserocr v2.5.2 - Python 3.10 - 64bit)
@iluz0r thanks for testing the new binaries! Would you be able to share some more details regarding the kind of exception you're getting? Also, did you set the TESSDATA_PREFIX path correctly? And what do you get for
import tesserocr
print(tesserocr.tesseract_version())
For the very basic example it seems to work for me (Python 3.10 - 64bit)
from tesserocr import PyTessBaseAPI
with PyTessBaseAPI() as api:
api.SetImageFile("path\to\image.png")
print(api.GetUTF8Text())
@iluz0r thanks for testing the new binaries! Would you be able to share some more details regarding the kind of exception you're getting? Also, did you set the TESSDATA_PREFIX path correctly? And what do you get for
import tesserocr print(tesserocr.tesseract_version())
For the very basic example it seems to work for me (Python 3.10 - 64bit)
from tesserocr import PyTessBaseAPI with PyTessBaseAPI() as api: api.SetImageFile("path\to\image.png") print(api.GetUTF8Text())
Concerning the tesseract version, I get:
tesseract 4.1.1 leptonica-1.81.1 (Feb 10 2022, 20:22:49) [MSC v.1930 LIB Release x64] libgif 5.2.1 : libjpeg 6b (libjpeg-turbo 2.0.6) : libpng 1.6.37 : libtiff 4.3.0 : zlib 1.2.11 : libwebp 1.2.1
I tried both path="path\to\tessdata" parameter inside the PyTessBaseAPI() constructor and the TESSDATA_PREFIX env var.
I can't give you more details about the exception since it prints anything in console.. just:
Process finished with exit code -1073741795 (0xC000001D)
Thank you
@iluz0r I see. Unfortunately, that does not give me much to go on. What's the extension of the image you're using? Have you tried with other images / image formats?
I got a working version of Tesseract (v. 5.0.1) following this guide: https://github.com/sirfz/tesserocr/issues/291#issuecomment-1019585097
Anyway, to make it working, I had to copy a set of dlls (tesseract50.dll, libpng16.dll, leptonica.. etc etc) inside the "venv\Lib\site-packages" dir of my project.
If you remove the Tesseract 5.x DLLs from venv\Lib\site-packages
and try GetUTF8Text()
, do you still get the exception (maybe you have conflicting DLLs)? Also, could you verify that the Tesseract 4.x DLLs are present in venv\Lib\site-packages\tesserocr
. What is the image extension (.png, .tiff, etc...) you are using?
If you remove the Tesseract 5.x DLLs from
venv\Lib\site-packages
and tryGetUTF8Text()
, do you still get the exception (maybe you have conflicting DLLs)? Also, could you verify that the Tesseract 4.x DLLs are present invenv\Lib\site-packages\tesserocr
. What is the image extension (.png, .tiff, etc...) you are using?
Yes, I'm trying in a new project with a new environment and the 4.1.1 wheel still gives exception (dll files are in venv\Lib\site-packages\tesserocr
as expected).
I had no problem with the previous version for python 3.7 (tesserocr v2.4.0 - Python 3.7 - 64bit)
I tried with tiff and jpg files.
Would you have time for a quick chat? https://stin.to/xn0gh
Would it be possible to get a build for 4.1.1.
Following memory leak for PNG is available in 4.1.0/4.1.1. It is one of the core functionality. This meamory leak makes 4.0.0 almost not usable in any production grade application. https://github.com/tesseract-ocr/tesseract/pull/2189/commits/9e6e3a0232dfa319c5d334d3b8a773e67bf87a18
Would appreciate your quick help.