tesseract-ocr / langdata_lstm

Data used for LSTM model training
Apache License 2.0
114 stars 152 forks source link

improve yoruba training data quality #10

Closed Timilehin closed 5 years ago

Timilehin commented 5 years ago

This PR improves the quality of the training yoruba data by adding some properly marked text and also removing some of the badly marked text which led to misrecognized characters.

One question I have is how frequently we plan to update the trainingdata with new data?

@Shreeshrii @zdenop