sinaahmadi / klpt

The Kurdish Language Processing Toolkit
https://sinaahmadi.github.io/klpt/
Other
91 stars 11 forks source link

Excessive tokenization to be fixed #10

Closed sinaahmadi closed 2 years ago

sinaahmadi commented 3 years ago

Some of the affixes are not required to be considered separate tokens on their own. This is particularly the case of articles, such as "eke" and "êk" in both Sorani and Kurmanji.