Closed jbarlow83 closed 6 years ago
This is known problem - there was always problem if you use data files from higher version in tesseract. At the moment of it is responsibility of packager or user to install correct version of data files. I do not think there will be fix for 3.x => Development is focused on 4.x (master branch).
I assume 4.00 has the same issue with '--oem 0'.
Environment
Current Behavior:
If supplied with Tesseract 4.x's .traineddata files, Tesseract 3.x will attempt to use them and fail with a variety of error messages. The error messages give no clue as to the problem or solution.
Some of them are short such as
read_params_file: parameter not found:
.In other cases Tesseract will spam the terminal with what appears to be a line-by-line dump of the entire .traineddata file:
Given the alpha status of Tesseract 4.x it seems some people are manually downloading Tesseract 4 data files and installing them in the wrong places by hand, or trying Tess4 and reverting to Tess3.
Steps to Reproduce:
Begin with a clean install of Tesseract 3.05.01
Manually deu.traineddata with the Tesseract 4.00.xx version such as https://github.com/tesseract-ocr/tessdata_best/blob/master/deu.traineddata
Run
tesseract -l deu testing/phototest.tif _ pdf
Output is
Expected Behavior:
Tesseract 3.x should refuse to use 4.x .traineddata files with a clear error message that the .traineddata files are incompatible.