manykarim / robotframework-doctestlibrary

Robot Framework DocTest library. Simple Automated Visual Document Testing.
Apache License 2.0
46 stars 20 forks source link

KW Compare Images #32

Open extmme opened 2 years ago

extmme commented 2 years ago

Hi there, I have been exploring your libraries for images and documents. I downloaded two of your pdf and tried to call example below. Compare Images testdata/sample_1_page.pdf testdata/sample_1_page_moved.pdf check_text_content=${true}

And I got this error: TesseractNotFoundError: tesseract is not installed or it's not in your PATH. See README file for more information.

It also happended with my pdf :)

Can you check it for me?

manykarim commented 2 years ago

Hi and thank you for trying out the library. Two things we can check.

1. Did you install the required dependencies (like Tesseract) as mentioned here? https://github.com/manykarim/robotframework-doctestlibrary#installation-instructions https://github.com/manykarim/robotframework-doctestlibrary#some-special-instructions-for-windows They are required in your case, as the option check_text_content=${true} will try to compare the text content of changed areas via OCR. Tesseract is required to use OCR, so it needs to be installed and added to your system path.

2. If you don't want to compare the text content (in changed areas) via OCR but use the PDF data instead, please also add the argument get_pdf_content=${true}. In that case, Tesseract will not be needed as the text value is retrived via muPdf from the PDF File itself (without OCR).

extmme commented 2 years ago
  1. Yes, I did
  2. I put get_pdf_content=${true} I used your example Compare Images testdata/sample_1_page.pdf testdata/sample_1_page_moved.pdf check_text_content=${true} and I am saying that I am getting the error TesseractNotFoundError: tesseract is not installed or it's not in your PATH. See README file for more information.
manykarim commented 2 years ago

Interesting, I will investigate a bit more. But just to confirm: When you open your shell/cmd somewhere and you type tesseract, it will find the command and list the options, right? Or you get a command not found error ?

Are you using the latest version of the library? What version is shown by pip? pip show robotframework-doctestlibrary Is it already the 0.2.0 version, like 0.2.0.20211223210536 ?

If not, can you please try to upgrade to the latest version? pip install --upgrade robotframework-doctestlibrary

extmme commented 2 years ago

Everything works fine now:) You made a right point within installation of Tesseract. My fault :( Thank you very much for your quick responses and problem solving and I do apologize for complications,

manykarim commented 2 years ago

No worrries my friend, I'm happy that you asked and that we could solve the problem. I also need to investigate why tesseract is exactly needed in your case, as pymupdf should be used to read the text content. So it's still a valid finding

extmme commented 2 years ago

Another keywords from libraries are suitable for our project :)