Open Shreeshrii opened 5 years ago
Using the finetuned digits traineddata gives slightly better results in some cases, but still does not work with default --psm.
This issue with non-recognition of small images has also been reported elsewhere. @stweil @bertsky Any suggestions for improving this.
Here is the output for 0-9.png and 06.jpg (different style and size of 6).
The digits
config file which uses the whitelist
feature improves the result. Thanks, @bertsky.
***** num/06.jpg OEM 1 LANG eng TESSDATA tessdata_best
**** PSM 3 ****
Empty page!!
Empty page!!
**** with digits config ****
Empty page!!
Empty page!!
**** PSM 8 ****
5
**** with digits config ****
5
***** num/0.png OEM 1 LANG eng TESSDATA tessdata_best
**** PSM 3 ****
Empty page!!
Empty page!!
**** with digits config ****
Empty page!!
Empty page!!
**** PSM 8 ****
Co
**** with digits config ****
0
***** num/1.png OEM 1 LANG eng TESSDATA tessdata_best
**** PSM 3 ****
Empty page!!
Empty page!!
**** with digits config ****
Empty page!!
Empty page!!
**** PSM 8 ****
IE
**** with digits config ****
***** num/2.png OEM 1 LANG eng TESSDATA tessdata_best
**** PSM 3 ****
Empty page!!
Empty page!!
**** with digits config ****
Empty page!!
Empty page!!
**** PSM 8 ****
2
**** with digits config ****
2
***** num/3.png OEM 1 LANG eng TESSDATA tessdata_best
**** PSM 3 ****
Empty page!!
Empty page!!
**** with digits config ****
Empty page!!
Empty page!!
**** PSM 8 ****
3
**** with digits config ****
3
***** num/4.png OEM 1 LANG eng TESSDATA tessdata_best
**** PSM 3 ****
Empty page!!
Empty page!!
**** with digits config ****
Empty page!!
Empty page!!
**** PSM 8 ****
Ce
**** with digits config ****
***** num/5.png OEM 1 LANG eng TESSDATA tessdata_best
**** PSM 3 ****
Empty page!!
Empty page!!
**** with digits config ****
Empty page!!
Empty page!!
**** PSM 8 ****
Cs
**** with digits config ****
5
***** num/6.png OEM 1 LANG eng TESSDATA tessdata_best
**** PSM 3 ****
Empty page!!
Empty page!!
**** with digits config ****
Empty page!!
Empty page!!
**** PSM 8 ****
Ce
**** with digits config ****
6
***** num/7.png OEM 1 LANG eng TESSDATA tessdata_best
**** PSM 3 ****
Empty page!!
Empty page!!
**** with digits config ****
Empty page!!
Empty page!!
**** PSM 8 ****
7
**** with digits config ****
7
***** num/8.png OEM 1 LANG eng TESSDATA tessdata_best
**** PSM 3 ****
Empty page!!
Empty page!!
**** with digits config ****
Empty page!!
Empty page!!
**** PSM 8 ****
Cs
**** with digits config ****
8
***** num/9.png OEM 1 LANG eng TESSDATA tessdata_best
**** PSM 3 ****
Empty page!!
Empty page!!
**** with digits config ****
Empty page!!
Empty page!!
**** PSM 8 ****
Cs
**** with digits config ****
I noticed the same if that single digit is placed far away from other blocks of characters. Interestingly, Google Cloud Vision sometimes suffer from the same problem.
Empty page issue also reported in https://github.com/tesseract-ocr/tesseract/issues/1362
tesseract -v tesseract 4.1.0-rc1-255-g332a1 leptonica-1.76.0 libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : libtiff 4.0.6 : zlib 1.2.8 : libwebp 0.4.4 : libopenjp2 2.3.0
Please see the issue opened by @jandier with a number of images which are NOT being recognized or being recognized incorrectly. https://github.com/Shreeshrii/tessdata_shreetest/issues/5#issuecomment-483053018