unicode-org / lstm_word_segmentation

Python code for training an LSTM model for word segmentation in Thai, Burmese, and similar languages.
Other
18 stars 8 forks source link

Missing Thai_graphclust_exclusive_model4_heavy #9

Open FrankYFTang opened 3 years ago

FrankYFTang commented 3 years ago

In ICU4X, we have a Thai_graphclust_exclusive_model4_heavy model in https://github.com/unicode-org/icu4x/blob/master/experimental/segmenter_lstm/tests/testdata/Thai_graphclust_exclusive_model4_heavy

but this model is not checked in under https://github.com/unicode-org/lstm_word_segmentation/tree/master/Models

SahandFarhoodi commented 3 years ago

You are right, this is an error. By looking at weights.json file, I can tell that name of the model in ICU4X should be Thai_graphclust_model4_heavy and we shouldn't have exclusive in the name, because I have grapheme clusters that are not Thai-specific in the dictionary there, such as space.