Open PedroBarcha opened 8 years ago
Error in boxClipToRectangle: box outside rectangle
Error in pixScanForForeground: invalid box
Add a white/black frame to the image and no error messages will appear.
convert 427-1.jpg -bordercolor White -border 10x10 427-1b.jpg
Strange behaviour...
The biggest problem for me, however, is that in OCRopus they don't even get OCRed.
This place is for bug reports about Tesseract, not OCRopus.
@amitdo I'm getting the same issue just with Tesseract. I'm guessing OCRopus is using Tesseract and that's why he made the issue here.
I'm guessing OCRopus is using Tesseract
Ocropy (and clstm) does not use Tesseract. A VERY OLD version of Ocropus (0.4) did use Tesseract.
Similar issues #468 #1601
These error messages are produced by Leptonica.
They are triggered by a call to pixClipBoxToForeground()
https://github.com/tesseract-ocr/tesseract/search?q=pixClipBoxToForeground
@stweil, this seems like a bug in Tesseract, maybe you can explore it and find its cause.
https://github.com/tesseract-ocr/tesseract/search?q=pixClipBoxToForeground
I noticed that Tesseract does not check the return value from Leptonica's functions (l_ok
).
@stweil, this seems like a bug in Tesseract, maybe you can explore it and find its cause.
It's caused by a box with width / height 0, but as always in Tesseract it is difficult to find the right fix.
This error is still present, tried to read an image of 250x50,and got the error..
after a few trials, I found that 250x51 is working, so apparently there's a limit for the smallest size of image
I have the same issue. I have a software that fetches images via wget and then runs ocr with tesseract on them. I noticed that with some images (or resolutions like I found out) the following error occurs:
Error in boxClipToRectangle: box outside rectangle
Error in pixScanForForeground: invalid box
I found out that this only occurs at some resolutions. So I wrote a script to check this on an example image. This script decreases successively the resolution of the image and then tries to apply ocr to it with tesseract. The image has a resolution of 2090x1504 pixel.
There are no errors till the height reaches 1578 pixels. Than irregulary some errors and from 1502p nearly for every image. Some images generate several of these errors, eg:
h: 1094
Error in boxClipToRectangle: box outside rectangle
Error in pixScanForForeground: invalid box
Error in boxClipToRectangle: box outside rectangle
Error in pixScanForForeground: invalid box
Error in boxClipToRectangle: box outside rectangle
Error in pixScanForForeground: invalid box
Error in boxClipToRectangle: box outside rectangle
Error in pixScanForForeground: invalid box
Unlike @Nemesis77swe ,
there's a limit for the smallest size of image
I don't think that there is a limit, I think it's maybe a mathematical issue somewhere in the code which causes a box with width / height of 0 like @stweil stated.
I attached the script and the output and this is the image.
Platform:
Linux notebook63 5.10.102.1-microsoft-standard-WSL2 #1 SMP Wed Mar 2 00:30:59 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Tesseract Version:
tesseract 5.2.0-13-g74e22
leptonica-1.79.0
libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 2.0.3) : libpng 1.6.37 : libtiff 4.1.0 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.1
Found AVX512BW
Found AVX512F
Found AVX2
Found AVX
Found FMA
Found SSE4.1
Found OpenMP 201511
Found libarchive 3.4.0 zlib/1.2.11 liblzma/5.2.4 bz2lib/1.0.8 liblz4/1.9.2 libzstd/1.4.4
Found libcurl/7.68.0 OpenSSL/1.1.1f zlib/1.2.11 brotli/1.0.7 libidn2/2.2.0 libpsl/0.21.0 (+libidn2/2.2.0) libssh/0.9.3/openssl/zlib nghttp2/1.40.0 librtmp/2.3
I tried this on an other windows machine in wsl with same results:
Ubuntu 20.04 (on both win machines) and Debian buster showing exact the same outputs.
@csidirop,
Does adding a white or black border to the image help?
https://github.com/tesseract-ocr/tesseract/issues/427#issuecomment-248153491
If not, post an image that demonstrate the issue.
Indeed, there are no errors after adding a white border
Hi there, I've got some specific images that output the following on linux:
The pictures get successfully OCRed in tesseract (without great results tho). The biggest problem for me, however, is that in OCRopus they don't even get OCRed.
Any ideas?