unicode-org / lstm_word_segmentation

Python code for training an LSTM model for word segmentation in Thai, Burmese, and similar languages.
Other
20 stars 9 forks source link

Add converter for ICU4C format #11

Closed FrankYFTang closed 3 years ago

FrankYFTang commented 3 years ago

See Design Doc in https://docs.google.com/document/d/1EVK2CwOmUamJwMOMbbdTz7tuaV0IR21rMoH7a3pyFwE/edit#heading=h.qkedw6o6vy20 And examples of converted results in

https://github.com/unicode-org/icu/blob/dc71e8b6eab06af9808af4153c5a8bdc185e2092/icu4c/source/data/brkitr/lstm/Thai.txt and https://github.com/unicode-org/icu/blob/dc71e8b6eab06af9808af4153c5a8bdc185e2092/icu4c/source/data/brkitr/lstm/Mymr.txt