The plan is also to consolidate the latinocr-lattraining repo so that we can have a single top-level make rule which makes the files which will go into the langdata repo and calls tesstrain.sh for the final build (similar to the updated grctraining process for Ancient Greek), and tesseract_latinocr_docker will merely provide the OS environment and Tesseract install.
Tesseract has introduced a new training system based around
tesstrain.sh
.Ideally, we want to be able to submit a pull request against the
langdata
repo, similar to that for Ancient Greek: https://github.com/tesseract-ocr/langdata/pull/19This may also require modifying/pull-req'ing the
lat
section of thelanguage-specific.sh
script.Right now work on this is happening in two places:
langdata
branch of this repotesstrain
branch of thetesseract_latinocr_docker
repoThe plan is also to consolidate the
latinocr-lattraining
repo so that we can have a single top-level make rule which makes the files which will go into thelangdata
repo and callstesstrain.sh
for the final build (similar to the updatedgrctraining
process for Ancient Greek), andtesseract_latinocr_docker
will merely provide the OS environment and Tesseract install.