Should we update swe.training_text if new characters are added to desired_characters ?

tesseract-ocr / langdata_lstm

Data used for LSTM model training

Apache License 2.0

115 stars 152 forks source link

Should we update swe.training_text if new characters are added to desired_characters ? #9

Open aslamy opened 5 years ago

aslamy commented 5 years ago

Recently I made a pull request to update the swedish desired_characters file with new characters. Now I see swe.training_text does not contains all new added desired_characters. Do we have to update swe.training_text and add thes new desired_characters, in order to to tesseract recognize them?

wrznr commented 5 years ago

Even if the source files (like training text and desired characters) are updated tesseract won't be able to recognize them without proper re-training. I am currently trying to find out how the training procedure for the stack models (i.e. those in the tessdata_* repos) works. Maybe the tesseract maintainers could elaborate on this...