Open goerlitz opened 6 months ago
https://github.com/openfoodfacts/openfoodfacts-server/issues/455 might be related.
@goerlitz We currently index words differently depending on language. For languages like French and English, users commonly drop accents, and there are very few conflicts (where removing accents gives an existing but different word). So when we index and query in English or French, we match both accented and unaccented.
For German, umlauts are more meaningful and are very rarely removed, so we keep them, in both queries and index. But that means searching in English (in the de-en domain) for accented words will not work well with products in German. If you want to search for German products, with German words in the query, then you should indicate that the query is in German (using lc=de or using the de domain).
Note that we will have a new search backend soon, it might behave differently. Check out the #search channel on Slack.
What
When searching German products with search terms containing Umlauts (ä,ö.ü,ß) the number of search results is different when using language English vs. German.
Steps to reproduce the behavior
Search term without Umlauts -> same results:
Search term containing Umlaut -> different results
"Käse" (cheese)
"Knödel" (dumplings)
"Müsli" (muesli)
"Spieß" (skewer) - but in Swiss German "Spiess"
"Soße" (sauce) - sauce is actually valid in German too, but Soße the preferred German name
It seems that the transliteration of
ä->ae
,ö->oe
,ü->ue
,ß->ss
for indexing and search are handled differently German vs. English.Expected behavior
The search should return the same results independent of the selected language and umlauts used. (like "Pâté" <-> pate in French).