meanna / ThaiLMCUT

MIT License
15 stars 5 forks source link

I think this work is the correct way to perform segmentation in Thai language. Sadly, no one further pursue this direction. #4

Open perathambkk opened 6 months ago

perathambkk commented 6 months ago

Character-based language model approach. They should have predicted the next syllable/character, at least in some poets. I am quite confidence that the extracted equivalent rules/automata/templates from many models (for linguistically zcs#op23$ verification) are nonsense.

Another promising way might be (unsupervised) sentencepiece/BPE. I am quite surprised to see what Thai word segmentation methods are for?

blackbird-fish commented 4 months ago

Maybe it can be used to get the tone pause in TTS task