projectEndings / staticSearch

A codebase to support a pure JSON search engine requiring no backend for any XHTML5 document collection
https://endings.uvic.ca/staticSearch/docs/index.html
Mozilla Public License 2.0
46 stars 21 forks source link

Interaction of weighting and phrasal search #290

Open martindholmes opened 4 months ago

martindholmes commented 4 months ago

This is not necessarily a bug or an FR, more a prompt for us to discuss and decide on a policy regarding the interaction of weighting and phrasal searching.

If you search for a single word, e.g. "wholesale", without quotation marks, and you get a hit which is in a weighted context, the hit will score based on the weighting (now that the weighting bug has been fixed). However, if you search for the phrase "Wholesale dealers" and get a hit in exactly the same context, the score the hit gets will not take account of the context weighting. This may perhaps be fine; when you do phrasal searching, you're typically seeking something very specific and the number of hit documents is expected to be very low anyway, so weighting is less important.

On the other hand, if you're searching for a phrase which occurs in many places, you might well want to benefit from weighting to bring hits in significant contexts to the top of the list.

On line 1674 of StaticSearch.js, we appear to assign a standard weight of 2 to any phrasal hit. However, we could take the weight directly from the context at that point. Or we could take the greater of the two, to ensure that phrases still get higher than the default weighting. Thoughts?

martindholmes commented 3 months ago

Branch weight_context_fix is dealing with this and a related problem with weighting.