manykarim / robotframework-doctestlibrary

Robot Framework DocTest library. Simple Automated Visual Document Testing.
Apache License 2.0
46 stars 20 forks source link

Question: is it feasible to compare two MS office documents? #24

Open fengnex opened 2 years ago

fengnex commented 2 years ago

As the above title suggests, I wonder whether it is feasible to compare the contents of two MS office documents like word or ppt and get the location of the difference, and then output a comparison picture containing the found difference.

Any response will be preferred. Thanks in advance.

manykarim commented 2 years ago

Yes, this should be possible. But as the library is focused on .PDF or Image comparisons, it would mean we need to convert those .pptx or .docx files to PDF (or PNG) first. (I would recommend .PDF).

To try it yourself:

fengnex commented 2 years ago

Thank you very much@manykarim Although there should be some ways to automatically convert Office documents into PDF files, which would then help us utilize the library, maybe it would be helpful and enrich the library's function if we can compare two Office documents.

But perhaps there is a certain difficulty when drawing a rectangle onto a Word document since it would change the layout of content, so maybe PDF is a better option in such a case.

manykarim commented 2 years ago

I could think about it. However there are already libraries o there to do the conversion from e.g. word to PDF. E.g. https://rpaframework.org/libraries/word_application/ Maybe it's worth checking those out first. I want to avoid some parallel/double development there

manykarim commented 2 years ago

Also this approach using pure python looks simple.. https://stackoverflow.com/questions/6011115/doc-to-pdf-using-python