Closed DerekMaggio closed 5 years ago
From the research I've done, all we need to do is add the corresponding .traineddata file to $TESSDATA_PREFIX/tesdata folder.
The file can be found at: https://github.com/tesseract-ocr/tessdata/blob/master/fra.traineddata
Per Quinn: This example page includes some French characters (accents). Initial tests indicate these could be whitelisted into the Tesseract character set. This could be very important, and it'd be nice to see some examples of the OCR with and without. For the production version, this would imply looking at how we'd configured tesseract for an OCR, and if that could be done dynamically.