Open jflesch opened 7 years ago
I had a quick look at the unit-tests. Looks like a lot of them are dependent on the actual OCR process itself. Seeing as the tesseract-ocr project changes the trained data occasionally, changes their actual algorithms, and one is not sure which version of the trained data each person has installed, you're pretty-much guaranteed to get slight mismatches in the OCR results every time you run the tests on a new machine.
Idea: provide a pre-compiled environment (container / chroot ?) containing Tesseract and all its dependencies ---> could be used to make the test results reproducibles perfectly.