manykarim / robotframework-doctestlibrary

Robot Framework DocTest library. Simple Automated Visual Document Testing.
Apache License 2.0
46 stars 20 forks source link

Regular Regression is not working #48

Closed nixuewei closed 1 year ago

nixuewei commented 1 year ago

locator4locator3 when i use below mask, it will not working for Date mask. Compare Images testdata/locator3.png testdata/locator4.png ocr_engine=east placeholder_file=testdata/pattern_mask.json

pattern_mask.json [ { "page": "all", "name": "Date Pattern", "type": "pattern", "pattern": ".*[0-9]{2}[\\s][a-zA-Z]{3}[\\s][0-9]{4}.*" } ]

1111
manykarim commented 1 year ago

ok, after doing a comparison with the normal tessaract and the east text recognition, I have the impression that my implementation of the east text recognition is not as accurate as it should be. So I think I need to improve it - in the meantime I recommend to not use the ocr_engine=east option. To check the regular text recognition, I used the keyword Get Text From Document to see what the library recognized in the image.

It seems like the date text 28 Oct 2022 is unfortunately split up into three objects: 28, Oct, 2022 As a result, the regular expression for ignoring the date does not work properly

I will search for a quick configuration change for tesseract, so that whole text lines are recognized instead of the single words.

nixuewei commented 1 year ago

@manykarim ok. thanks for you quick reply. actullay I am working on doctestlibrary to do basic visual regression testing. if it can add accurate mask, it will make us start visual compare and will do more. so thank you for you great software and waiting for fix on next release. have a good day.

nixuewei commented 1 year ago

BTW, why i use ocr_engine=east option becasue when i use below command directly, it give error as below. It can't working on my windows11. but ocr_engine=east option is working on my system. SCRIPT: Compare Images testdata/locator3.png testdata/locator4.png placeholder_file=testdata/pattern_mask.json EORROR: 20220903 09:21:50.143 : INFO : Compare Image Object created in 0.0136 seconds Re-Render document for OCR at 300 DPI as current resolution is only 72 DPI 20220903 09:21:50.143 : FAIL : ValueError: invalid literal for int() with base 10: '0.000000' Ending test: Robotframework-Doctestlibrary-Main.Atest.Compare.Compare two Farm images with date pattern

manykarim commented 1 year ago

Took a bit longer, but new 0.9.0 release contains that fix for

ValueError: invalid literal for int() with base 10: