Closed nixuewei closed 1 year ago
ok, after doing a comparison with the normal tessaract
and the east
text recognition, I have the impression that my implementation of the east
text recognition is not as accurate as it should be.
So I think I need to improve it - in the meantime I recommend to not use the ocr_engine=east
option.
To check the regular text recognition, I used the keyword Get Text From Document
to see what the library recognized in the image.
It seems like the date text 28 Oct 2022 is unfortunately split up into three objects: 28, Oct, 2022 As a result, the regular expression for ignoring the date does not work properly
I will search for a quick configuration change for tesseract, so that whole text lines are recognized instead of the single words.
@manykarim ok. thanks for you quick reply. actullay I am working on doctestlibrary to do basic visual regression testing. if it can add accurate mask, it will make us start visual compare and will do more. so thank you for you great software and waiting for fix on next release. have a good day.
BTW, why i use ocr_engine=east option becasue when i use below command directly, it give error as below. It can't working on my windows11. but ocr_engine=east option is working on my system.
SCRIPT:
Compare Images testdata/locator3.png testdata/locator4.png placeholder_file=testdata/pattern_mask.json
EORROR:
20220903 09:21:50.143 : INFO : Compare Image Object created in 0.0136 seconds Re-Render document for OCR at 300 DPI as current resolution is only 72 DPI 20220903 09:21:50.143 : FAIL : ValueError: invalid literal for int() with base 10: '0.000000' Ending test: Robotframework-Doctestlibrary-Main.Atest.Compare.Compare two Farm images with date pattern
Took a bit longer, but new 0.9.0 release contains that fix for
ValueError: invalid literal for int() with base 10:
when i use below mask, it will not working for Date mask.
Compare Images testdata/locator3.png testdata/locator4.png ocr_engine=east placeholder_file=testdata/pattern_mask.json
pattern_mask.json
[ { "page": "all", "name": "Date Pattern", "type": "pattern", "pattern": ".*[0-9]{2}[\\s][a-zA-Z]{3}[\\s][0-9]{4}.*" } ]