Im getting different results using a larger image together with a uzn file versus manually cropping the image to the bound in the uzn file and running the same command. Can't see why the found text should differ?
Text in image:S21-3002-84B-A0000-0S-2107
UZN version - Gives the wrong recognition of the text: 521-3002-84B-A0000-05-2107
(first S becomes a 5 and also the second S becomes a 5 )
tesseract.exe "D:\test\S21uzn.tif" "D:\test\S21uzneng" --psm 4 --oem 1 -l eng hocr
Manually cropped version - Gives the correct results of the text: S21-3002-84B-A0000-0S-2107
tesseract.exe "D:\test\S21cropped.tif" "D:\test\S21croppedeng" --psm 4 --oem 1 -l eng hocr
Environment
Current Behavior:
I'm using the best version of the english traineddata https://github.com/tesseract-ocr/tessdata/blob/master/eng.traineddat
Im getting different results using a larger image together with a uzn file versus manually cropping the image to the bound in the uzn file and running the same command. Can't see why the found text should differ?
Text in image:S21-3002-84B-A0000-0S-2107
UZN version - Gives the wrong recognition of the text: 521-3002-84B-A0000-05-2107 (first S becomes a 5 and also the second S becomes a 5 )
tesseract.exe "D:\test\S21uzn.tif" "D:\test\S21uzneng" --psm 4 --oem 1 -l eng hocr
Manually cropped version - Gives the correct results of the text: S21-3002-84B-A0000-0S-2107
tesseract.exe "D:\test\S21cropped.tif" "D:\test\S21croppedeng" --psm 4 --oem 1 -l eng hocr
Example files attached. tess_uzn_bug.zip
Expected Behavior:
Uzn version to output the same recognized text as manually cropped version.
Suggested Fix:
Unknown