whoosh-community / whoosh

Whoosh is a fast, featureful full-text indexing and searching library implemented in pure Python.
Other
252 stars 37 forks source link

Infinite search #135

Open fortable1999 opened 13 years ago

fortable1999 commented 13 years ago

Original report by rogerb_aviga (Bitbucket: rogerb_aviga, GitHub: Unknown).


I have a requirement to show the user a virtually infinite list of results. It is far easier to visually see results and scroll through them than to get a short list of results and then have to keep typing to broaden your query to get more.

In my current implementation I wrap search and see if there are fewer than limit results. If there is only one item then I do more_like_this and append those to the Results. If more than one then I get key_terms and append a search for those.

Spelling correction can also be mixed into this. If the search was for 'niel' (mis-spelling of neil) and it so happens that one doc matches, I'd like the following documents to mix in the likely better spelling.

My enhancement request is for a function that does infinite search (always returns limit results) and uses the existing matches plus knowledge of key terms, spelling etc to fill out the remainder of the list.

fortable1999 commented 13 years ago

Original comment by rogerb_aviga (Bitbucket: rogerb_aviga, GitHub: Unknown).


I use dismax so all queries are OR. My final implementation:

(I know the offset and limit passed in so extra work is only done if the results will be looked at.) This whole approach works very well. In the future I'd want to mix in stemming, double metaphone, spelling correction etc.

fortable1999 commented 13 years ago

Original comment by Matt Chaput (Bitbucket: mchaput, GitHub: mchaput).


This is interesting. I'll try to do something with this (at least as an example, if not as core functionality) when I finish with the better spell checking.

Maybe a good first step to try to reach an arbitrary limit, before MLT or autocorrecting, would be to add on results from rewriting the query to be less restrictive (e.g. convert any AND clauses into ORs).

fortable1999 commented 13 years ago

Original comment by rogerb_aviga (Bitbucket: rogerb_aviga, GitHub: Unknown).


A first approximation is a more_like_these function. On Results it gathers up the docnums of the top results and then calls more_like_these on Searcher. more_like_these on Searcher is virtually identical to more_like_this taking docnums instead of docnum. I then extend the existing results with these.

#!python
    def more_like_these(self, docnums, fieldname, top=10, numterms=5, normalize=False, model=classify.Bo1Model):
        """Get more like a range of docs"""
        # code copied from above
        kts = self.key_terms(docnums, fieldname, numterms=numterms,
                                 model=model, normalize=normalize)
        # Create an Or query from the key terms
        q = query.Or([query.Term(fieldname, word, boost=weight)
                      for word, weight in kts])

        # Filter the original document out of the results using a bit vector
        # with every bit set except the one for this document
        size = self.doc_count_all()
        comb = BitVector(size, [n for n in xrange(self.doc_count_all())
                                if n not in docnums])
        return self.search(q, limit=top, filter=comb, optimize=False)