tesseract-ocr / langdata_lstm

Data used for LSTM model training
Apache License 2.0
114 stars 152 forks source link

Missing some Thai numbers in Thai language (tha) #42

Open crossknight opened 3 years ago

crossknight commented 3 years ago

I found that some Thai numbers are missing. The missing numbers are ๔, ๖, ๗, ๘ and ๙. The missing numbers don't exist in tha.training_text and tha.unicharset files.

I am not sure how to add the missing numbers to the model without training it from scratch because there is a problem when I try to combine the finetune model with the old model that unicharset number is unequal to the new model (also try --old_traineddata parameter but it did not work).

Thank you.