Open KoichiYasuoka opened 3 years ago
Hi @KoichiYasuoka
Yes yes, true. Thank you for pointing out this. Will incorporate it.
Thank you Sarves
I've just introduced thamizhi-udp
in my diary to my Japanese colleagues. Stanza's original mwt
model works with your pos
model pretty well, and they can be connected with your parse.sh
.
Tamil tokenizer of stanza needs
mwt
model. For example, the word குதிரையும் is divided into two words:But
thamizhi-udp
does not usemwt
model, thus the word disappears: