openfoodfacts / openfoodfacts-server

Open Food Facts database, API server and web interface - 🐪🦋 Perl, CSS and JS coders welcome 😊 For helping in Python, see Robotoff or taxonomy-editor
http://openfoodfacts.github.io/openfoodfacts-server/
GNU Affero General Public License v3.0
658 stars 388 forks source link

Search in Russian: consider "е" and "ё" interchangable in Russian #455

Open aleksejrs opened 8 years ago

aleksejrs commented 8 years ago

What

Find "е" when searching for "ё", and "ё" when searching for "е". Do not do that for editing.

Part of

stephanegigandet commented 8 years ago

Are those the only interchangeable characters?

And can you think of any conflicts where a word/name with "e" means something different than the same word with "ë"?

aleksejrs commented 8 years ago

Are those the only interchangeable characters?

Yes, for letters. "ё"/"Ё" has usually been written as "е"/"Е", and there are people with strong opinions on that. Some labels mix different styles (logos and details). As for other characters, there is, of course, punctuation like "«»", "„“" etc.

And can you think of any conflicts where a word/name with "e" means something different than the same word with "ë"?

That is a problem, but probably not one we can do anything about. It will probably mostly affect place names.

teolemon commented 8 years ago

fr:pâtés fr:pates

aleksejrs commented 8 years ago

I am talking only about search (where you enter search keywords and get results). It's read-only.

There has already been a problem which resulted in some labels or categories to have "и" instead of "й". Nothing like that must happen.

aleksejrs commented 8 years ago

Although categories might have no problem if е/ё synonyms are auto-generated. I don't know. If there is "ё", it contains more information than "е", and simple synonyms would not cause dataloss before a problem is noticed.

hangy commented 8 years ago

We need to see if this is something that needs to be implemented locale-aware. We have had bug reports where "de:Bürger" was automatically deaccentuated to "de:Burger", even though those are not interchangeable.

aleksejrs commented 8 years ago

"Ё" is a separate letter, not a "е" with an umlaut.

hangy commented 8 years ago

True, in German they are just vowels with umlauts, but they cannot be used interchangeably either. In Finnish or Swedish, "Ö" is considered a separate letter, too, but deaccentuated by OFF all the same. Not saying we should not do it, I just want to raise caution. 👍

aleksejrs commented 8 years ago

You could instead enforce the use of "ё" where the label says "е" instead of "ё". ;)