NOTE:TEST2.tif is obtained from TEST1.tif adding some white space to the left
What steps will reproduce the problem?
1. tesseract.exe TEST1.tif test1 -l ita
2. tesseract.exe TEST2.tif test2 -l ita
What is the expected output? What do you see instead?
I expect to get the same result, because I only change the page dimension. I
get very different result.
What version of the product are you using? On what operating system?
tesseract 3.02
leptonica-1.68 (Mar 14 2011, 10:43:03) [MSC v.1500 LIB Release 32 bit]
libgif 4.1.6 : libjpeg 8c : libpng 1.4.3 : libtiff 3.9.4 : zlib 1.2.5
Windows 7
Please provide any additional information below.
I run tesseract with debug and it seems that tesseract cannot get the
characters bounding.
Enclosed is a couple of screenshots where you can see the characters detection
in test1 and in test2.
Are there some configuration flags that I can set to fix this?
In test1 you can notice that I also have problems with segmentation, because
tesseract is splitting wrong some lines of text (ex: BOTTINELLI -> BOT TINELLI
and DESTINATARIO -> DES TINATARIO) because of a wrong page segmentation.
I've tryed other psm flag but nothing better that the default.
Again: Are there some configuration flags that I can set to fix this?
My big concern is that only adding some white space to my source image i get
very different result.
This is confusing me! I supposed that I can remove borders to reduce image
dimension.
Original issue reported on code.google.com by stefano....@gmail.com on 3 Mar 2015 at 8:11
Original issue reported on code.google.com by
stefano....@gmail.com
on 3 Mar 2015 at 8:11Attachments: