reynoldsnlp / udar

UDAR Does Accented Russian: A finite-state morphological analyzer of Russian that handles stressed wordforms.
GNU General Public License v3.0
26 stars 1 forks source link

stress on MWEs with multiple stresses #19

Closed reynoldsnlp closed 4 years ago

reynoldsnlp commented 5 years ago

The lexical underlying form needs to have a persistent stress mark that survives the two-level rule that reduces stresses to the right-most one. For example,...

то есть
так как
красно-жёлтых
reynoldsnlp commented 5 years ago

так что

reynoldsnlp commented 5 years ago

search through an fst2strings version of a stressed transducer for any words with stresses on both sides of spaces and hyphens. Something like this: egrep ":.*[ё́̀].*(% |-).*[ё́̀]"

reynoldsnlp commented 5 years ago

See giellatekno revision 181941. Most of these have been fixed by changing preceding stress marks that should persist into secondary stress marks ('\u0300').

Those like красно-жёлтых came from the RNC addendum to Zalizniak, so they don't have stress marked at all. They already had a TODO comment to work with a native to place stress. I added a comment that this should be done using secondary stress marks.

reynoldsnlp commented 5 years ago

Still need to fix the RNC addendum entries, but everything from revision 181941 was pushed in a93e062f70a7e3db880cf5972b8016c2ecf7204b

reynoldsnlp commented 4 years ago

Moved to https://github.com/giellalt/lang-rus/issues/2