reynoldsnlp / udar

UDAR Does Accented Russian: A finite-state morphological analyzer of Russian that handles stressed wordforms.
GNU General Public License v3.0
26 stars 1 forks source link

add alternative output formats #17

Open reynoldsnlp opened 5 years ago

reynoldsnlp commented 5 years ago

This may not be possible in every case, but where possible, add other common output formats:

reynoldsnlp commented 5 years ago

As for connl-u format, there does not appear to be any way to represent ambiguity, so the conversion would be lossy.

reynoldsnlp commented 5 years ago

mystem can have ambiguous readings separated by | in its output, even with the -d (disambiguate) flag:

$ echo "Мы уже работаем здесь три недели." | mystem3.1 -ind
Мы{мы=SPRO,мн,1-л=им}
уже{уже=ADV=}
работаем{работать=V,несов,нп=непрош,мн,изъяв,1-л}
здесь{здесь=ADVPRO=}
три{три=NUM=им|три=NUM=вин,неод}
недели{неделя=S,жен,неод=вин,мн|неделя=S,жен,неод=род,ед|неделя=S,жен,неод=им,мн}