mchaput / whoosh

Pure-Python full-text search library
Other
569 stars 69 forks source link

Random access scoring #22

Open SnowyCoder opened 2 years ago

SnowyCoder commented 2 years ago

Hello, thanks for this library!

In my project I'm trying to create a query aggregator using multiple indexes (take a query, run it on multiple searchers, aggregate the results). It would be really useful to have a way to access a score from an index in a random-access way (ex. given a document id, a query and a searcher, what's the resulting score?). The solution I came up with having not much experience with the code-base is this:

from whoosh.matching import IntersectionMatcher, ListMatcher
from whoosh.query import Query
from whoosh.searching import Searcher

def random_access_score(query: Query, searcher: Searcher, docid: int) -> tuple[int, float]:
    for subsearcher, offset in searcher.leaf_searchers():
        m = query.matcher(subsearcher, context=searcher.context())
        m = IntersectionMatcher(ListMatcher([docid], [0]), m)
        if m.is_active():
            return m.id(), m.score()
    # necessary in case of no hit for docid
    return -1, 0

Is this correct? is there a more efficient way to do this? (I expect this to iterate over all of the posting lists, right?).

Thank you in advance!