mchaput / whoosh

Pure-Python full-text search library
Other
569 stars 69 forks source link

Boosts are not being applied for Compound Prefix Queries #42

Open Ainesh opened 1 year ago

Ainesh commented 1 year ago

There seem to be a bug in the macher() implementation of MultiTerm class. The boosts applied to the Prefix queries do not propagate to the final score calculation. When qs is generated with Term queries using the input Prefix queries, boosts are not transferred to the newly generated queries. https://github.com/mchaput/whoosh/blob/d9a3fa2a4905e7326c9623c89e6395713c189161/src/whoosh/query/terms.py#LL210C9-L211C23

Working example:

from whoosh import fields, scoring
from whoosh.analysis import SimpleAnalyzer
from whoosh.filedb.filestore import RamStorage
from whoosh.query import Prefix, Or, Term

schema = fields.Schema(title=fields.TEXT(analyzer=SimpleAnalyzer()))

storage = RamStorage()
ix = storage.create_index(schema)

def get_weighting():
    def my_scorer(searcher, fieldname, text, matcher):
        return 1
    return scoring.FunctionWeighting(my_scorer)

with ix.writer() as writer:
    writer.add_document(title="apple")
    writer.add_document(title="banana")
    writer.add_document(title="orange")
    writer.add_document(title="grape")

with ix.searcher(weighting=get_weighting()) as searcher:
    prefix_queries = [Prefix("title", "app", boost=2.0)]
    # term_queries = [Term("title", "apple", boost=2.5)]

    query = Or(prefix_queries)
    results = searcher.search(query, scored=True, limit=None, terms=True)
    for result in results:
        print(f"Matched document: {result.docnum}; Score: {result.score}")

Here, changing the boost in Prefix("title", "app", boost=2.0) doesn't make a difference to the final score for the document. It is always 1. On the other hand, boost in the Term query works as expected. The final score changes based on the boost value.