Open teolemon opened 8 years ago
Let's maybe simply remove the unconditional deaccenting of tags when they are canonicalized (ProductOpener::Store::unac_string_perl
or ProductOpener::Store::get_fileid
)? We had other reports where this yieled similar unintended results in German.
We should probably also use Unicode::Casing to support different languages properly (ie. Turkish I problem)?
@hangy probably. @maddingue knowledgeable about this ?
For French we should keep the unaccenting, it helps in many cases, a lot of people type "boeuf" (I have no idea how to type the oe char in fact ;-) ). There are a few conflicts where 2 words that deaccent to the same string mean 2 different things, but they are very rare. One example is pâte and pâté.
One problem is that get_fileid
does not have a language/country for context. äöü shouldn't be replaced for a de
locale, for sure. There's just too much potential for conflict, and noone with a German keyboard layout writes "Doener" instead of "Döner".
Unconditional unaccenting of é
to e
for other languages than French might still cause conflicts. I honestly don't know enough about all languages to know how ie. a native Hungarian speaker would handle that.
We can close this one, right ?
We can close this one, right ?
Depends. https://world.openfoodfacts.org/category/fr:p%C3%A2t%C3%A9s and https://world.openfoodfacts.org/category/fr:pates both redirect to https://world.openfoodfacts.org/category/pastas, as unaccenting is intentionally enabled for French: https://github.com/openfoodfacts/openfoodfacts-server/blob/e73668733e0dbb353f4b37fd29f6ded2afc8c55e/lib/ProductOpener/Config_off.pm#L124-L127
What
fr:pâtés fr:pates are amalgamated when typing the category from world.off
Part of
5538