well-typed / full-text-search

An in-memory full text search engine library. It lets you run full-text queries on a collection of your documents.
Other
47 stars 5 forks source link

Documentation requestion: SearchRankParameters #5

Open chris-martin opened 3 years ago

chris-martin commented 3 years ago

I've found this package pretty easy to work with so far, with the exception of SearchRankParameters. In particular, I don't understand what paramK1 and paramB mean. I see them mentioned in The Probabilistic Relevance Framework: BM25 and Beyond section 3.4.5, Document Length, and I see section 3.5, Uses of BM25, has a few relevant bits:

setting b = 1 will perform full document-length normalisation, while b = 0 will switch normalisation off

A significant number of such experiments have been done, and suggest that in general values such as 0.5 < b < 0.8 and 1.2 < k1 < 2 are reasonably good in many circumstances

But I get from this no sense of what k1 means, and little guidance as to why I would want more or less document-length normalization for a particular field.

I don't think elaborate explanations in the documentation are necessary; a few hints would likely go a very long way.

chris-martin commented 3 years ago

@dcoutts If you might be able to say how you selected these parameter values for the demo program, any recollections would be most welcome. https://github.com/well-typed/full-text-search/blob/a87317c94f326fc7fb83ef998d84d1ccaac1f4ea/demo/PackageSearch.hs#L82-L88