Open jacobthill opened 2 years ago
Need to determine when fuzzy matching might not make sense so we don't inadvertently hurt recall for other users. Need technical analysis on how Solr generates "did you mean" terms and what inputs Solr needs to make this function work. Did you mean would vary across languages.
Input for the research needed for that task can be taken from Google Analytics. I prepared a customized report that showes search queries of DLME users per day: https://analytics.google.com/analytics/web/#/savedreport/h5BVDr61TQSN7EuLQEmSYA/a136246606w196538874p191551838/_u.date00=20211101&_u.date01=20220106&_.advseg=&_.useg=&_.sectionId=&_r.dsa=1
For days that have more search queries, you can try to deduct the evolution of search queries within a single user session. For example:
That may be helpful to notice the types of typos people are doing and check how our system reacts to it. The last column shows "Avg. search depth" and may suggest how useful the results were for different query variants (the bigger the better).
There seems to be some fuzzy matching of search terms (probably intended to capture misspelled queries) that leads to unexpected results. e.g. “Mahdi” returns documents containing “Mehdi” and “Majdi”. Even when the query is entered with quotations marks, it still seems to return fuzzy matches. Fuzzy matching seems particularly problematic in DLME because many transliterated terms are being treated as misspelled words. (other terms: Meidum). Fuzzy matching should be turned off and we should ensure that quotation marks return exact matches as expected.