languagetool-org / languagetool

Style and Grammar Checker for 25+ Languages
https://languagetool.org
GNU Lesser General Public License v2.1
12.31k stars 1.39k forks source link

[ru] "ё" letter support #454

Open kostyfisik opened 8 years ago

kostyfisik commented 8 years ago

Words with misspelled "е\ё" are not corrected with spellchecker (USSR typographical simplification allows this), however, such words are not detected with part-of-speech analysis.

ежик - - ёжик ёжик NN:Masc:Sin:Nom

kostyfisik commented 8 years ago

It is more important for rule triggering. "будет найдёт минимум" triggers the rule, "будет найдет минимум" does not.

yakovru commented 8 years ago

The current dictionary is focused on the consistent use of the letter "Ё". This allows the program to more correctly process texts. But there are many books in which the letter "Ё" is not used. There was even an constructed version of the dictionary with a peer using the letters "Ё" and "E". http://myooo.ru/usercontent/extentions/dict-ieyo-LT2.7.tar.gz But for the Russian language there is only one locale "ru-ru". At the moment, there is no way to select a different dictionary.

kostyfisik commented 8 years ago

Does it mean that there is a need to provide a ru_RU@yo locale? (so this will correspond to ISO standart [language[_territory][.codeset][@modifier]] notation) This will make is possible to completely ignore "ё" in ru_RU and to leave an optional ё support in ru_RU@yo.

yakovru commented 8 years ago

Does it mean that there is a need to provide a ru_RU@yo locale? Yes. But this locale is not present in LibreOffice and OpenOffice.

yakovru commented 8 years ago

But locale must be ru_RU@ie - for "е" only. ru_RU - with "Ё".

kostyfisik commented 8 years ago

@yakovru Probably *Office dev do not have enough butthurt for sentence like "Ежик Алеша ел под елкой" Anyway, there is a "no yo" tradition. Is it hard to introduce ru_RU@noyo locale in LT? (ru_RU@ie is a bad name, due to presence of IE web browser, which was default for Windows OS for many years)

yakovru commented 8 years ago

ru_RU@noyo is bad idea because some word must be write with "Ё" anyway (family, name, surname, etc.) ru_RU@e is best.

kostyfisik commented 8 years ago

Is it hard to introduce ru_RU@e in LT?

yakovru commented 8 years ago

I think it is possible.

yakovru commented 8 years ago

I'll try to do it.

petrkoshkin commented 7 years ago

This does make sense as words with misspelled "е\ё" are not corrected with spellchecker. Current behaviour of Tagger is inconsistent. The spellchecker says that the word "мед" is correct but the tagger knows nothing about this word.

yakovru commented 7 years ago

The word "мед" is a abbreviation from word "медицинский" like "мед. изделия".
The word "мёд" mean "honey".

petrkoshkin commented 7 years ago

You are right it can be "медицинский" like "мед. изделия", but often "мед." in this case will contain dot at the end. You can still find books were "мед" means "honey" The words "ежик", "перепелка", "веселый", "лед" can be read by native speakers without any difficulties. I know that it could be arguable, but many authoritative sources like https://ru.wikipedia.org/wiki/%D0%81 contains the following guidelines:

Азбучная истина № 7. Употребление буквы ё обязательно в текстах с последовательно поставленными знаками ударения, в книгах для детей младшего возраста (в том числе учебниках для школьников младших классов), в учебниках для иностранцев. В обычных печатных текстах ё рекомендуется писать в тех случаях, когда возможно неправильное прочтение слова, когда надо указать правильное произношение редкого слова или предупредить речевую ошибку. Букву ё следует также писать в собственных именах. В остальных случаях употребление ё факультативно, то есть необязательно.

petrkoshkin commented 7 years ago

The first comment shows obvious inconsistency of the current behavior:

Words with misspelled "е\ё" are not corrected with spellchecker (USSR typographical simplification allows this), however, such words are not detected with part-of-speech analysis.

kostyfisik commented 6 years ago

The first comment (non working rules without ё) screenshot update screenshot from 2018-07-25 00-26-37

yakovru commented 5 years ago

Now words with misspelled "е\ё" are added to POS tag dictionary for correct tagging. https://github.com/languagetool-org/languagetool/commit/9cc4104e6feebcbbbcd4932a6c2cefa87b44399b

tiff commented 5 years ago

@kostyfisik the user replied back. Here's the example sentence where the issue is apparently still happening:

"Уступить дорогу (не создавать помех)" - требование, означающее, что участник дорожного движения не должен начинать, возобновлять или продолжать движение, осуществлять какой-либо манёвр, если это может вынудить других участников движения, имеющих по отношению к нему преимущество, изменить направление движения или скорость.

Bildschirmfoto 2019-09-03 um 11 42 38

kostyfisik commented 5 years ago

@yakovru could you please take a look for this?

yakovru commented 5 years ago

Both forms  (манёвр + маневр) are valid according to the printed version of the dictionary, but   (манёвр) is preferred.  I'll check the other word forms included in the dictionary.

Вторник, 3 сентября 2019, 12:43 +03:00 от Christopher Blum:

@kostyfisik the user replied back. Here's the example sentence where the issue is apparently still happening:

"Уступить дорогу (не создавать помех)" - требование, означающее, что участник дорожного движения не должен начинать, возобновлять или продолжать движение, осуществлять какой-либо манёвр, если это может вынудить других участников движения, имеющих по отношению к нему преимущество, изменить направление движения или скорость.