openfoodfacts / search-a-licious

🍊🔎 A pluggable search service for large collections of objects (like Open Food Facts)
https://search.openfoodfacts.org
GNU Affero General Public License v3.0
10 stars 6 forks source link

boost_phrase should split by sentence of various length if we want to catch synonyms #250

Open alexgarel opened 1 month ago

alexgarel commented 1 month ago

while search?q=organically grown&boost_phrase=1 finds items with label en:organic (which has organically grown as synonym), q=organically grown plants&boost_phrase=1 does not work.

This is because The PhraseBoostTransformer groups all the words in a phrase. If we want synonyms to be detected, we would need to group terms 2 by 2, 3 by 3, 4 by 4 etc. so that organically grown plants, results in adding "organically grown" OR "grown plants" OR "organically grown plants"

See the deactivated test in test_search.py