lucaong / minisearch

Tiny and powerful JavaScript full-text search engine for browser and Node
https://lucaong.github.io/minisearch/
MIT License
4.9k stars 137 forks source link

How can I know how much the result is relative to my search input ? #25

Closed rAbdelHadi closed 5 years ago

rAbdelHadi commented 5 years ago

First I would like to thank you for this great contribution and amazing work you provided for the open community.

I have an issue where I need to know how much the result is relative to my search input, I know that there is a score but to determine quality of the results I can't depends on the score because the score equation is depending on many factors.

How can I find a way to say for example if the score is more than "x" which means that the result quality is excellent and actually my clients find what they need to search for and if the score is less than "x" which means that the client maybe found the result for what he search and maybe not.

Thank you so much again.

lucaong commented 5 years ago

Hi @rAbdelHadi , thanks for the kind words!

As you say, each item in the result list contains a score, which is a number indicating how relevant the result is (bigger is better):

let results = miniSearch.search('zen art motorcycle')
// => [
//   {
//     id: 2,
//     score: 2.77258,
//     match: { ... }
//   },
//   {
//     id: 4, 
//     score: 1.38629,
//     match: { ... }
//   }
// ]

The caveat is that the score is a relative measure: it is useful to compare the relevance of different results, but it is difficult to choose an absolute threshold.

In your case, one possibility is to experiment with different thresholds and see what works.

The other possibility, if you want to limit the number of results to only relevant ones, is to configure MiniSearch in a way that improves precision at the expense of recall (gives you higher quality results, at the expense of possibly missing some). Some possibilities are:

Probably the best is to combine both approaches: first optimize the search options to yield more relevant results, then if you still get many results, you could remove the ones with low score using a custom threshold that works for your application.

You can see some of the options in action in the demo application under "advanced options".

rAbdelHadi commented 5 years ago

Thanks for your replay.

Actually I am doing all of the above but the thing which makes the solution is much complicated is fuzziness.

the result which may return for the user, it may be what he needs to search for but because of misspelling in his input its not an exact match.

or maybe it return's something similar for what he search for but its not an actual result for him.

for example user input 'facebok' and the result returned is 'facebook is awesome' , 'I like facebook'.

in the above example the users misspelled his input but from the result returned you can tell that he finds what he search for.

another example user write 'facebook' and the result returned is 'facetook is meaningless word' , 'facetook is incorrect'

You can tell that the result returned for the user is what he needs to search for but because of the fuzziness the result returned is meaning less result for the user

I understand that is a our human brain works but what I am trying to achieve is understand if I should consider the search input as missed keyword or not because eventually I have to show like a report which shows that the users couldn't find these input and I build a decision on it.

Thank you again and much appreciated for your support.

lucaong commented 5 years ago

If you need to measure if users in general find what they look for, and not evaluate a specific search, you could measure a metric like the click-through rate: the idea is that when a user clicks on a result, they probably found it relevant, whereas if they leave the page without choosing a result, they probably did not find the result useful. On average, this will give you an indication of how good your search results are.

Is this what you want to do?

rAbdelHadi commented 5 years ago

Yes indeed I suggested this also but my boss wants me to find it using some kind of algorithms or another solution which doesn't require any user interaction.

Thanks a lot for your help again.

lucaong commented 5 years ago

I see the point, but I am afraid that what your boss wants is not possible: whether a specific search result is relevant or not depends on the individual user, so it cannot be measured "algorithmically" without collecting user interaction or feedback. As an example, if a user searches for "ruby sneakers", there is no way to automatically guess in advance if they want ruby red sport shoes or if they want to know about the Sneakers software library for the Ruby programming language.

If you are interested in learning more about how to evaluate the effectiveness of an information retrieval system, I can recommend this very good book available online for free. It has one chapter about evaluation of user utility. Alternatively, for a quick reference, this wikipedia page can help.