Open polyzen opened 2 years ago
This error is related to tesseract itself - which version that? Also, is there a sample image that causes that error?
Oh right: tesseract 5.1.0
The image used by the test: https://github.com/madmaze/pytesseract/blob/v0.3.10/tests/data/test.jpeg2000
Well, hmmm. CI on master passes, so not shure what is going on there. PS: Yep, your tesseract version is new enough and CI still uses 4.1.x
At this point, I would check what changed in 5.1.0 in order to not support jpeg2000, because clearly 4.x works with jpeg2000. It might be the imaging library support in Tesseract or something like that.
Have you tried using tesseract directly with the jpeg2000 image?
Have you tried using tesseract directly with the jpeg2000 image?
I haven't yet used tesseract, I only build pytesseract to provide as an optional dependency for urlwatch in the Arch repos.
At the moment, I don't have tesseract 5.1.0 around + Arch instance in order to test if it is pytesseract related or tesseract specific issue. When I have time, I will try to boot up a container with that setup in order to check.
Same issue here. I debugged it, and in my case the root cause was determined as follows:
The remedy for me was to recompile leptonica with OpenJPEG 2.4.0 support.
However for py-pytesseract, it should skip the test if there are indications that tesseract does not support JPEG2000.
Thank you for investigating that @mandree - I am not sure if there is a nice way to ask tesseract if that is the case or not.
Sadly pytesseract
is designed as a thin wrapper around the tesseract executable and doesn't provide any feel integration.
You can query tesseract
with -v
or --version
apparently.
See the line right below leptonica, it mentions liboopenjp2
(or not).
First two examples from FreeBSD 13.0 amd64, third and last example on Fedora 35 x86_64.
With JPEG2000 support:
$ tesseract -v
tesseract 5.1.0
leptonica-1.82.0
libgif 5.2.1 : libjpeg 8d (libjpeg-turbo 2.1.3) : libpng 1.6.37+apng : libtiff 4.3.0 : zlib 1.2.11 : libwebp 1.2.2 : libopenjp2 2.4.0
Found OpenMP 201811
Found libarchive 3.6.0 zlib/1.2.11 liblzma/5.2.5 bz2lib/1.0.8 liblz4/1.9.3 libzstd/1.5.2
Found libcurl/7.82.0 OpenSSL/1.1.1k zlib/1.2.11 libssh2/1.10.0 nghttp2/1.46.0
And without:
$ tesseract -v
tesseract 5.1.0
leptonica-1.82.0
libgif 5.2.1 : libjpeg 8d (libjpeg-turbo 2.1.3) : libpng 1.6.37+apng : libtiff 4.3.0 : zlib 1.2.11 : libwebp 1.2.2
Found OpenMP 201811
Found libarchive 3.6.0 zlib/1.2.11 liblzma/5.2.5 bz2lib/1.0.8 liblz4/1.9.3 libzstd/1.5.2
Found libcurl/7.82.0 OpenSSL/1.1.1k zlib/1.2.11 libssh2/1.10.0 nghttp2/1.46.0
Fedora Linux:
$ tesseract -v
tesseract 4.1.3
leptonica-1.81.1
libgif 5.2.1 : libjpeg 6b (libjpeg-turbo 2.1.0) : libpng 1.6.37 : libtiff 4.3.0 : zlib 1.2.11 : libwebp 1.2.2
Found AVX2
Found AVX
Found FMA
Found SSE
Hi @bozhodimitrov you marked this as completed but I do not see relevant commits nor a comment. In what way was this fixed?
Hi @bozhodimitrov you marked this as completed but I do not see relevant commits nor a comment. In what way was this fixed?
Hi, old issue + it is just closed, not completed + pytesseract does the right thing to notify the users of the underlying tesseract error. From there on it is responsibility of the user to update their stack with supported third-party components.
Unless you want to make a PR with parsing all supported formats while invoking the --version
and then do the actual checking and deactivating functions, and handling the response to the user and adding tests for all of this, then all I can do is to convert it to a Feature Request.
The current error report is enough for all users that search for this specific error to find the workaround that you all shared. Which means that there is no point of this issue staying open anymore.
Let me know what you think.
pytesseract.pytesseract.TesseractError: (1, 'Error in pixReadStreamJp2k: function not present Error in pixReadStream: jp2: no pix returned Error in pixRead: pix not read Error during processing.')
pytesseract 0.3.10 tesseract 5.1.0 pillow 9.0.1 openjpeg2 2.4.0 pytest 7.1.0 python 3.10.2
Old title:
test_image_to_string_with_image_type[jpeg2000] failure with tesseract >4.1.x