Closed EnTeQuAk closed 6 years ago
Short status update on this: https://bugzilla.mozilla.org/show_bug.cgi?id=1479922 is tracking ops-related work on this.
The plan is to create a new cluster that gets's linked to -dev and contains the current data from -prod.
I'll try to get in a waffle-flag or setting that switches between the new/old similarity indexing strategy so that we can enable it quickly on -prod once tests show a success.
This requires a full reindex nonetheless.
This is again, hard to test but I enabled it on -stage so please play around and see if you see any noticeable improvements or if things got worse.
The following screenshots were taken before the actual change on -stage.
Themes: Pink branch: https://screenshots.firefox.com/z8BFVgIIMGIGsNG1/addons.allizom.org Happy Spring Daisies-1: (used "Happy Spring"): https://screenshots.firefox.com/D3TIZAW8btP6tkPH/addons.allizom.org summer ladybug: https://screenshots.firefox.com/iWbmjK4ycBC5TZFH/addons.allizom.org "summer": https://screenshots.firefox.com/xaeUCrRTzyZbVnMj/addons.allizom.org Flag of Columbia: (used "flag columbia"): https://screenshots.firefox.com/iGnVBPpzvGWfPBAC/addons.allizom.org my first sentence: (used "first sentence"): https://screenshots.firefox.com/EBY3e7dMvZHouBU8/addons.allizom.org Three Wolf Moon Shirt: (used "three wolf"): https://screenshots.firefox.com/KOQNy4ayZCMLDLvw/addons.allizom.org Fritz-Walter-Stadion: https://screenshots.firefox.com/DYNmMcTmMIYcuWsH/addons.allizom.org
Add-ons: Tab Mix Plus: https://screenshots.firefox.com/MAsdDiDz4WH4fqqa/addons.allizom.org NoScript Security Suite: https://screenshots.firefox.com/LbMZiuFXTHT86I9S/addons.allizom.org Web of Trust - WOT: (used "Web of Trust") https://screenshots.firefox.com/P9jezQhvCXbffUpW/addons.allizom.org uBlock Origin: (used "uBlock") https://screenshots.firefox.com/iWMJUR8raJPSMGqX/addons.allizom.org Flagfox: https://screenshots.firefox.com/HoEsc9BHY8dNN8Kj/addons.allizom.org Ciuvo - Price check in your browser: (used "Price check") https://screenshots.firefox.com/hEBmWMn8m2fO7LVN/addons.allizom.org Yet Another Smooth Scrolling: (used "Smooth Scrolling") https://screenshots.firefox.com/oVhDDMuj9FFiQc6U/addons.allizom.org "Stealthy": https://screenshots.firefox.com/JpDiIF07ugOcdjcK/addons.allizom.org "Facebook Container": https://screenshots.firefox.com/rnF2slt4WNltDol8/addons.allizom.org "Facebook": https://screenshots.firefox.com/yvulnjE09aA5tTEd/addons.allizom.org "Youtube Downloader": https://screenshots.firefox.com/HZbiz98jCK0YJxjI/addons.allizom.org "Adblocker": https://screenshots.firefox.com/ETQuTMEfcJgjQKo6/addons.allizom.org "Ad blocker": https://screenshots.firefox.com/SbNi9SQVEOYL34L6/addons.allizom.org
Also, some of these scenarios may come in useful: https://github.com/mozilla/addons-server/blob/566a54e4c6234d04e28faf8ddcbf08c7ae08fdd3/src/olympia/search/tests/test_search_ranking.py#L416-L573 - but note that stage may not have sufficient data here.
I'll update this comment once everything is on -stage and enabled so that you can test better.
I've checked several search scenarios on stage, including the ones mentioned above. Here comes a long post:
Themes
Extensions
Other examples - still needs improvement:
@EnTeQuAk Conclusion: search results do not appear to be broken but I didn't feel they have improved very much either. Overall, however, I was pretty satisfied with the results I've got (event though stage has more themes than extensions, which make search results look disproportionate in some cases) Maybe on prod there are other conditions that interfere with search results (i.e. usage, ratings, popularity that we can't possibly capture on stage (?)
Nice, thanks a ton for taking the time to check this. Yes, testing this on staging is only a "is it completely broken or could we try it out on prod?"-test tbh.
It's only a setting so can be easily switched on and off in production later so the risk is fairly small.
Given our both findings that this isn't terribly broken, I'd go forward and try this out in production. If we find that results are significantly worse than before we can switch back in minutes.
I have to add that this "fix" only changes the algorithm, for now, search results are ranked. It doesn't affect the search-results itself much so it's still only an interim solution but may improve our current situation.
I honestly only now discovered that the similarity setting in ElasticSearch is configurable. When I upgraded to ElasticSearch 5 it seemed that the default is BM25 and the default TF/IDF algorithm would go away sometime soon.
https://www.elastic.co/guide/en/elasticsearch/reference/5.0/index-modules-similarity.html doesn't look like it. The same is still valid for ES 6.x.
Let's investigate and see if that fixes some of our pain points.
Refs mozilla/addons#3097