tesseract-ocr / tesseract

Tesseract Open Source OCR Engine (main repository)
https://tesseract-ocr.github.io/
Apache License 2.0
60.86k stars 9.36k forks source link

Plate Detection is Empty #1185

Closed ibr123 closed 6 years ago

ibr123 commented 6 years ago

Hi, I'm using tesseract 4.00.00dev-692-gad5ee18, leptonica-1.74.4 and the detection is fine for this version, for English language, but when i try to detect the following image, the results is empty, although the font and image size is clear, and the tesseract already detected other images: 7

and the command for detection is: tesseract 7.jpg result -l eng --tessdata-dir ./tessdata/eng_best --oem 1

keep in mind i used the same command for other images and worked fine, but not this one, and the eng.traineddata i got it from here

Thanks

UPDATE: i have changed some properties of the image hoping it will give results, the DPI was less than 100, i changed the DPI several times, also resized the image, and sometimes changed its gray scale and made it to black and white, yet the same results, although in some cases i got few characters as detection but they were not correct 400 DPI: 7new_400dpi 300 DPI: 7new_res_300dpi 7new_res5_300dpi

600DPI: 7new_res2_600dpi 7new_res4_600dpi 7new_res3_600dpi is there some certain specs for the image to give the best detection?

UPDATE: here is the tessinput image for one of the images:- tessinput

fakabbir commented 6 years ago

Thanks for the detailed question, the reason is still not clear to me also. The reason might be the leptonica library that tesseract uses or something else.

This link talk about the same. https://groups.google.com/forum/#!topic/tesseract-ocr/fxDxAQigO98

Did you looked at the link: https://www.pyimagesearch.com/2017/07/17/credit-card-ocr-with-opencv-and-python/

ibr123 commented 6 years ago

@fakabbir your welcome i changed the images size to 1325x690, 600 DPI and used in the command the arguments --psm 12 and --psm 11 and the detection was fine, but when i changed the size of the image to 1219x635, DPI 600 and with the same command the detection wasn't empty but far from correct, i don't why did that happened since the difference in size was very little, also this image didn't have any noise or any imperfections. what i think the problem is that its a problem involved in boxes size, since the Tesseract uses boxes to detect characters

ibr123 commented 6 years ago

UPDATE: i was looking for a way to get the boxes of each character because i suspected that is it the problem, i found this issue so i have tried the command tesseract test.jpg out batch.nochop makebox but it only works on tesseract 3, and didn't work for tesseract 4.

ibr123 commented 6 years ago

UPDATE: i tried the same license plate image with details of (1341X699 300 dpi) against tesseract 4.0.0-beta.1-21-gbdf6629 and leptonica-1.75.3 the command was: tesseract image.jpg results -l eng --tessdata-dir ./tessdata --oem 1 --psm 11 the traineddata is trained_best and the result was like the following:

ol 8

oct’

3HUALT2

its better with psm 11, yet the word California was not detected at all

Shreeshrii commented 6 years ago

This maybe a case where --oem 0 or 2 and traineddata from tessdata repo may give you better result.

You could also try with the 3.05 branch.

ibr123 commented 6 years ago

@Shreeshrii Thanks, i found that the best results was like the following. tesseract 4.0.0-beta.1-21-gbdf6629 and leptonica-1.75.3 traineddata the old one from here command: tesseract image.jpg results -l eng --tessdata-dir ./tessdata --oem 2 --psm 11 the results was:

CA 94

OCT\

3HUA17Z

which is the best results so far i tried tesseract the version tesseract 3.05.00dev and leptonica-1.73 same command and same traineddata, but the results were not better,

qcfl _ , “94

3HUA17Z

is there anything else i can do that can enhance the Detection results?