olivernn / lunr.js

A bit like Solr, but much smaller and not as bright
http://lunrjs.com
MIT License
8.94k stars 548 forks source link

Benchmark Search Algorithm #405

Open indolering opened 5 years ago

indolering commented 5 years ago

It would be ideal if there was some sort of integration test that would help gauge changes to the search engine algorithm (such as PageRank boosting, text in <h1> tags, etc).

The only non-trivial, freely available judgement scores I could find are for a medical database (linked to from this article).

indolering commented 5 years ago

FWIW, it would be easier to just scrape Google search results of various static sites instead of manually ranking them. I strongly suspect that this will still just become a zombie ticket :P

olivernn commented 5 years ago

In the past I have looked into creating something like this, I initially looked into using the cranfield dataset as it is probably more of the kind of size that something like Lunr is more commonly used with. I think I ran into problems with translating the description of the query into something that Lunr would understand.

The dataset you linked to seems interesting, though perhaps larger than the typical index size for Lunr. The other thing to keep in mind is that search relevancy isn't exact, and these datasets would only be able to give an indication of results for one kind of data set / use case. If I've learn't anything over the years of developing and maintaining Lunr its that it is used in many varied ways!

I'm more than happy to help though if you are interested in taking something like this on. It would be interesting to compare different search libraries also to see how they compare.

indolering commented 5 years ago

Spent more time on this, there are data sets from Yahoo and Bing for Learn To Rank competitions. We can use a random subset to train the SVM and go from there. I've decided to turn this into my lunchtime distraction.

indolering commented 5 years ago

Spent a lot more time on this, found some datasets that are more suitable to the task. However, it looks like the scoring is not normalized and the min/max values depend on the exact mix of fields being used. Is this correct?