ryanfb / latinocr-lat

'lat' repository, forked from https://github.com/ryanfb/ancientgreekocr-grc. The final training process for lat.traineddata
https://ryanfb.github.io/latinocr/
Apache License 2.0
13 stars 3 forks source link

Move training process into `tesstrain.sh` system #4

Closed ryanfb closed 8 years ago

ryanfb commented 8 years ago

Tesseract has introduced a new training system based around tesstrain.sh.

Ideally, we want to be able to submit a pull request against the langdata repo, similar to that for Ancient Greek: https://github.com/tesseract-ocr/langdata/pull/19

This may also require modifying/pull-req'ing the lat section of the language-specific.sh script.

Right now work on this is happening in two places:

The plan is also to consolidate the latinocr-lattraining repo so that we can have a single top-level make rule which makes the files which will go into the langdata repo and calls tesstrain.sh for the final build (similar to the updated grctraining process for Ancient Greek), and tesseract_latinocr_docker will merely provide the OS environment and Tesseract install.

ryanfb commented 8 years ago

Forked language-specific.sh now at https://github.com/ryanfb/tesseract/tree/latin-language-specific