Is it possible to train multiple languages on a one model file ?

tmbdev / clstm

A small C++ implementation of LSTM networks, focused on OCR.

Apache License 2.0

821 stars 224 forks source link

Is it possible to train multiple languages on a one model file ? #136

Closed lomograb closed 7 years ago

amitdo commented 7 years ago

Yes, but each script should be in a separate line.

amitdo commented 7 years ago

For mixed scripts in the same line see this paper: https://www.researchgate.net/publication/280777013_A_Sequence_Learning_Approach_for_Multiple_Script_Identification

lomograb commented 7 years ago

Does CLSTM support this (mixed scripts in the same line) ?

amitdo commented 7 years ago

It's not supported out-of-the-box, but you can implement what's described in that paper with clstm.

lomograb commented 7 years ago

Thank you @amitdo for replying and this great project too. Okay, going to close this issue

mittagessen commented 7 years ago

As a note there is a model for doing the script identification exactly as described in the article (arrived upon independently) at kraken-models. It is able to differentiate between Arabic, Syriac, Cyrillic, Greek, Latin, and Fraktur.