Closed Shreeshrii closed 5 years ago
The WordStr option creates the box files using tesseract and then replaces the OCRed text with the ground truth using sed and paste. There might be an alternate/better way to handle this.
@kba @wrznr Thank you both for your feedback. I will make the requested changes.
I have also been testing the makefile for use of 'script/xxx' traineddata as base model and have a few more changes. I will update once I make those changes too.
New PR posted as https://github.com/tesseract-ocr/tesstrain/pull/87
Examples
For Tamil, Add a new font style (Impact)
For Arabic, Add new characters (Plus)
For English, From Scratch