openpaperwork / pyocr

A Python wrapper for Tesseract and Cuneiform -- Moved to Gnome's Gitlab
https://gitlab.gnome.org/World/OpenPaperwork/pyocr
931 stars 152 forks source link

Test environment to make tests reproducable #82

Open jflesch opened 6 years ago

jflesch commented 6 years ago

Idea: provide a pre-compiled environment (container / chroot ?) containing Tesseract and all its dependencies ---> could be used to make the test results reproducibles perfectly.

ZoranPavlovic commented 6 years ago

I had a quick look at the unit-tests. Looks like a lot of them are dependent on the actual OCR process itself. Seeing as the tesseract-ocr project changes the trained data occasionally, changes their actual algorithms, and one is not sure which version of the trained data each person has installed, you're pretty-much guaranteed to get slight mismatches in the OCR results every time you run the tests on a new machine.