simonflueckiger / tesserocr-windows_build

MIT License
206 stars 64 forks source link

Build for tesseract 4.1.1 #12

Closed moloyc closed 1 year ago

moloyc commented 4 years ago

Would it be possible to get a build for 4.1.1.

Following memory leak for PNG is available in 4.1.0/4.1.1. It is one of the core functionality. This meamory leak makes 4.0.0 almost not usable in any production grade application. https://github.com/tesseract-ocr/tesseract/pull/2189/commits/9e6e3a0232dfa319c5d334d3b8a773e67bf87a18

Would appreciate your quick help.

dynobo commented 3 years ago

Any chance you can get the windows builds up again? Sirfz recently published detailed build instrunctions, which might help.

tb102122 commented 2 years ago

https://github.com/simonflueckiger/tesserocr-windows_build/releases/tag/tesserocr-v2.5.2-tesseract-4.1.1

@moloyc @dynobo should be done now you can close the issue

iluz0r commented 2 years ago

https://github.com/simonflueckiger/tesserocr-windows_build/releases/tag/tesserocr-v2.5.2-tesseract-4.1.1

@moloyc @dynobo should be done now you can close the issue

GetUTF8Text() crashes for me (tesserocr v2.5.2 - Python 3.10 - 64bit)

simonflueckiger commented 2 years ago

@iluz0r thanks for testing the new binaries! Would you be able to share some more details regarding the kind of exception you're getting? Also, did you set the TESSDATA_PREFIX path correctly? And what do you get for

import tesserocr
print(tesserocr.tesseract_version())

For the very basic example it seems to work for me (Python 3.10 - 64bit)

from tesserocr import PyTessBaseAPI

with PyTessBaseAPI() as api:
    api.SetImageFile("path\to\image.png")
    print(api.GetUTF8Text())
iluz0r commented 2 years ago

@iluz0r thanks for testing the new binaries! Would you be able to share some more details regarding the kind of exception you're getting? Also, did you set the TESSDATA_PREFIX path correctly? And what do you get for

import tesserocr
print(tesserocr.tesseract_version())

For the very basic example it seems to work for me (Python 3.10 - 64bit)

from tesserocr import PyTessBaseAPI

with PyTessBaseAPI() as api:
    api.SetImageFile("path\to\image.png")
    print(api.GetUTF8Text())

Concerning the tesseract version, I get:

tesseract 4.1.1 leptonica-1.81.1 (Feb 10 2022, 20:22:49) [MSC v.1930 LIB Release x64] libgif 5.2.1 : libjpeg 6b (libjpeg-turbo 2.0.6) : libpng 1.6.37 : libtiff 4.3.0 : zlib 1.2.11 : libwebp 1.2.1

I tried both path="path\to\tessdata" parameter inside the PyTessBaseAPI() constructor and the TESSDATA_PREFIX env var.

I can't give you more details about the exception since it prints anything in console.. just:

Process finished with exit code -1073741795 (0xC000001D)

Thank you

simonflueckiger commented 2 years ago

@iluz0r I see. Unfortunately, that does not give me much to go on. What's the extension of the image you're using? Have you tried with other images / image formats?

iluz0r commented 2 years ago

I got a working version of Tesseract (v. 5.0.1) following this guide: https://github.com/sirfz/tesserocr/issues/291#issuecomment-1019585097

Anyway, to make it working, I had to copy a set of dlls (tesseract50.dll, libpng16.dll, leptonica.. etc etc) inside the "venv\Lib\site-packages" dir of my project.

simonflueckiger commented 2 years ago

If you remove the Tesseract 5.x DLLs from venv\Lib\site-packages and try GetUTF8Text(), do you still get the exception (maybe you have conflicting DLLs)? Also, could you verify that the Tesseract 4.x DLLs are present in venv\Lib\site-packages\tesserocr. What is the image extension (.png, .tiff, etc...) you are using?

iluz0r commented 2 years ago

If you remove the Tesseract 5.x DLLs from venv\Lib\site-packages and try GetUTF8Text(), do you still get the exception (maybe you have conflicting DLLs)? Also, could you verify that the Tesseract 4.x DLLs are present in venv\Lib\site-packages\tesserocr. What is the image extension (.png, .tiff, etc...) you are using?

Yes, I'm trying in a new project with a new environment and the 4.1.1 wheel still gives exception (dll files are in venv\Lib\site-packages\tesserocr as expected). I had no problem with the previous version for python 3.7 (tesserocr v2.4.0 - Python 3.7 - 64bit)

I tried with tiff and jpg files.

simonflueckiger commented 2 years ago

Would you have time for a quick chat? https://stin.to/xn0gh