sinaahmadi / klpt

The Kurdish Language Processing Toolkit
https://sinaahmadi.github.io/klpt/
Other
91 stars 11 forks source link

Some words aren't analysed, although they are in Apertium #6

Closed ftyers closed 3 years ago

ftyers commented 3 years ago

Output of Python analyser:

('dixwî', [[]])

Output of Apertium:

$ echo dixwî | apertium -d ~/source/apertium/languages/apertium-kmr/ kmr-morph
^dixwî/xwarin<vblex><tv><pri><p2><sg>$

I will look into this. Feel free to assign it to me.

sinaahmadi commented 3 years ago

Just noticed it, Francis. Sure. I assigned it to you. Please consider adding such cases to test_stem.py. Thanks!

ftyers commented 3 years ago

It seems to be to do with sequences of epsilons:

fran@ipek:~/source/klpt$ hfst-txt2fst klpt/data/kmr-analyser.att | hfst-fst2strings -e '0' | grep "d0*i0*x0*w0*î"
dixw00000î00:xwarin<vblex><tv><pri><p2><sg>0
sinaahmadi commented 3 years ago

I see. Do you think it has something to do with att_analyzer.py? We can try with this one as you suggested before.

ftyers commented 3 years ago

Yep, it's definitely in there, I'll try and play around a bit more and if I can't get it to work we can try the one that Måns wrote :)