mozilla / addons

☂ Umbrella repository for Mozilla Addons ✨
Other
125 stars 41 forks source link

ElasticSearch: Experiment with `classic` similarity setting #5785

Closed EnTeQuAk closed 6 years ago

EnTeQuAk commented 6 years ago

I honestly only now discovered that the similarity setting in ElasticSearch is configurable. When I upgraded to ElasticSearch 5 it seemed that the default is BM25 and the default TF/IDF algorithm would go away sometime soon.

https://www.elastic.co/guide/en/elasticsearch/reference/5.0/index-modules-similarity.html doesn't look like it. The same is still valid for ES 6.x.

Let's investigate and see if that fixes some of our pain points.

Refs mozilla/addons#3097

EnTeQuAk commented 6 years ago

Short status update on this: https://bugzilla.mozilla.org/show_bug.cgi?id=1479922 is tracking ops-related work on this.

The plan is to create a new cluster that gets's linked to -dev and contains the current data from -prod.

I'll try to get in a waffle-flag or setting that switches between the new/old similarity indexing strategy so that we can enable it quickly on -prod once tests show a success.

This requires a full reindex nonetheless.

EnTeQuAk commented 6 years ago

This is again, hard to test but I enabled it on -stage so please play around and see if you see any noticeable improvements or if things got worse.

The following screenshots were taken before the actual change on -stage.

Themes: Pink branch: https://screenshots.firefox.com/z8BFVgIIMGIGsNG1/addons.allizom.org Happy Spring Daisies-1: (used "Happy Spring"): https://screenshots.firefox.com/D3TIZAW8btP6tkPH/addons.allizom.org summer ladybug: https://screenshots.firefox.com/iWbmjK4ycBC5TZFH/addons.allizom.org "summer": https://screenshots.firefox.com/xaeUCrRTzyZbVnMj/addons.allizom.org Flag of Columbia: (used "flag columbia"): https://screenshots.firefox.com/iGnVBPpzvGWfPBAC/addons.allizom.org my first sentence: (used "first sentence"): https://screenshots.firefox.com/EBY3e7dMvZHouBU8/addons.allizom.org Three Wolf Moon Shirt: (used "three wolf"): https://screenshots.firefox.com/KOQNy4ayZCMLDLvw/addons.allizom.org Fritz-Walter-Stadion: https://screenshots.firefox.com/DYNmMcTmMIYcuWsH/addons.allizom.org

Add-ons: Tab Mix Plus: https://screenshots.firefox.com/MAsdDiDz4WH4fqqa/addons.allizom.org NoScript Security Suite: https://screenshots.firefox.com/LbMZiuFXTHT86I9S/addons.allizom.org Web of Trust - WOT: (used "Web of Trust") https://screenshots.firefox.com/P9jezQhvCXbffUpW/addons.allizom.org uBlock Origin: (used "uBlock") https://screenshots.firefox.com/iWMJUR8raJPSMGqX/addons.allizom.org Flagfox: https://screenshots.firefox.com/HoEsc9BHY8dNN8Kj/addons.allizom.org Ciuvo - Price check in your browser: (used "Price check") https://screenshots.firefox.com/hEBmWMn8m2fO7LVN/addons.allizom.org Yet Another Smooth Scrolling: (used "Smooth Scrolling") https://screenshots.firefox.com/oVhDDMuj9FFiQc6U/addons.allizom.org "Stealthy": https://screenshots.firefox.com/JpDiIF07ugOcdjcK/addons.allizom.org "Facebook Container": https://screenshots.firefox.com/rnF2slt4WNltDol8/addons.allizom.org "Facebook": https://screenshots.firefox.com/yvulnjE09aA5tTEd/addons.allizom.org "Youtube Downloader": https://screenshots.firefox.com/HZbiz98jCK0YJxjI/addons.allizom.org "Adblocker": https://screenshots.firefox.com/ETQuTMEfcJgjQKo6/addons.allizom.org "Ad blocker": https://screenshots.firefox.com/SbNi9SQVEOYL34L6/addons.allizom.org

Also, some of these scenarios may come in useful: https://github.com/mozilla/addons-server/blob/566a54e4c6234d04e28faf8ddcbf08c7ae08fdd3/src/olympia/search/tests/test_search_ranking.py#L416-L573 - but note that stage may not have sufficient data here.

I'll update this comment once everything is on -stage and enabled so that you can test better.

AlexandraMoga commented 6 years ago

I've checked several search scenarios on stage, including the ones mentioned above. Here comes a long post:

Themes

  1. Pink branchhttps://screenshots.firefox.com/yh4ixBS3rxV9AGAu/addons.allizom.org The only difference observed is that the search query will also look into author names, not just add-on names; In this case, for example, we can find search results which do not include either of the queried terms in the add-on name, but they are present in the author name
  2. Happy Springhttps://screenshots.firefox.com/ti4bBeR91Co4tbOy/addons.allizom.org No major differences
  3. summer ladybughttps://screenshots.firefox.com/swdroQtNQpRTwuK3/addons.allizom.org Compared to the previous results, this search query will bring up more themes that include the “ladybug” term
  4. Summerhttps://screenshots.firefox.com/XwUZwhLUV670RJ55/addons.allizom.org No major differences
  5. Flag columbiahttps://screenshots.firefox.com/AJ5N1pfjIA1nL7CV/addons.allizom.org Search accuracy slightly decreased - ‘Flag of Columbia’ is displayed much lower in search results, while previously it occupied the second position
  6. First sentence → ttps://screenshots.firefox.com/6WM4eW7h1viDaqk6/addons.allizom.org No major differences
  7. three wolfhttps://screenshots.firefox.com/a94TSG1ukDKrtrFM/addons.allizom.org Similar to 3, new search results will include more add-ons including ‘wolf’ and less including ‘three’
  8. Fritz-Walter-Stadionhttps://screenshots.firefox.com/zXWTUNyv854xfK4S/addons.allizom.org Same as before (1 result exact match) Also displayed first in the list when searching by ‘Fritz-Stadion’

Extensions

  1. Tab Mix Plushttps://screenshots.firefox.com/JjnJVeIU7qG7ZZfh/addons.allizom.org Here is an example of a search query that looks into the add-on summary to find results (addon found on the second position contains “Tab Mix Plus’ in the summary)
  2. NoScript Security Suitehttps://screenshots.firefox.com/6enBOQEJ4BvXqy3p/addons.allizom.org No major differences
  3. Web of trusthttps://screenshots.firefox.com/xz1kNa7etEWx2Hph/addons.allizom.org Slightly improved, search results will display the 2 closest matches first in the list
  4. Ublockhttps://screenshots.firefox.com/osJse1ojBq9CrMiQ/addons.allizom.org improved - displays the closest matches higher in the list
  5. Flagfoxhttps://screenshots.firefox.com/foeVbMus8NtSZtT6/addons.allizom.org No major differences
  6. Price checkhttps://screenshots.firefox.com/AvCr5twH53ifuRAm/addons.allizom.org Improved - displays the closest matches higher in the list
  7. Smooth scrollinghttps://screenshots.firefox.com/d33w1Z1bkYSHFnel/addons.allizom.org No major differences
  8. Stealthyhttps://screenshots.firefox.com/b9b4NyGKDKym8iun/addons.allizom.org
  9. Facebook Containerhttps://screenshots.firefox.com/uDRErmSis91I4TUB/addons.allizom.org
  10. YouTube Downloaderhttps://screenshots.firefox.com/SBPlRgdln9oAFDaj/addons.allizom.org 8,9,10 - No major differences
  11. adblockerhttps://screenshots.firefox.com/IObNmdm5m1eyL9Zx/addons.allizom.org
  12. Ad blockerhttps://screenshots.firefox.com/4dJdhOwd4wzdJmEM/addons.allizom.org 11,12 - slight improvement - more extensions are listed higher in the list compared to before when themes used to occupy most of the list
  13. youtube enhancerhttps://screenshots.firefox.com/byjw09WKtJWbRPHF/addons.allizom.org Shows relevant results higher in the list

Other examples - still needs improvement:

  1. Ad block plushttps://screenshots.firefox.com/f1hWYXJbdR5DKu2R/addons.allizom.org It should bring extensions with Adblock Plus and its derivatives highest in the list
  2. Search this on → is not listed among search suggestions and is not included among the first pages of search results
  3. https://github.com/mozilla/addons/issues/731 - issue still reproducing If I follow the same steps as pointed in the issue I would expect to find this extension in the results

@EnTeQuAk Conclusion: search results do not appear to be broken but I didn't feel they have improved very much either. Overall, however, I was pretty satisfied with the results I've got (event though stage has more themes than extensions, which make search results look disproportionate in some cases) Maybe on prod there are other conditions that interfere with search results (i.e. usage, ratings, popularity that we can't possibly capture on stage (?)

EnTeQuAk commented 6 years ago

Nice, thanks a ton for taking the time to check this. Yes, testing this on staging is only a "is it completely broken or could we try it out on prod?"-test tbh.

It's only a setting so can be easily switched on and off in production later so the risk is fairly small.

Given our both findings that this isn't terribly broken, I'd go forward and try this out in production. If we find that results are significantly worse than before we can switch back in minutes.

I have to add that this "fix" only changes the algorithm, for now, search results are ranked. It doesn't affect the search-results itself much so it's still only an interim solution but may improve our current situation.