whoosh-community / whoosh

Whoosh is a fast, featureful full-text indexing and searching library implemented in pure Python.
Other
240 stars 36 forks source link

Not possible to boost word adjacency in search results #560

Open felixvor opened 4 years ago

felixvor commented 4 years ago

For example, if I search for "Whoosh is great", results with that exact quote should have a higher priority over texts that just talk about whoosh in general and use the word "great" out of context. It would be great to have doc1 be the top search result in the following example:

from whoosh.index import create_in
from whoosh.qparser import OrGroup
from whoosh.fields import *

doc1 = "bla bla whoosh is great bla bla"
doc2 = "whoosh bla is bla great bla whoosh"
doc3 = "whoosh bla bla bla whoosh"
doc4 = "bla bla"

schema = Schema(name=TEXT, content=TEXT(stored=True))
ix = create_in("temp_index", schema)

writer = ix.writer()
writer.add_document(name="doc1", content=doc1)
writer.add_document(name="doc2", content=doc2)
writer.add_document(name="doc3", content=doc3)
writer.add_document(name="doc4", content=doc4)
writer.commit()

from whoosh.qparser import QueryParser
with ix.searcher() as searcher:
    query = QueryParser("content", schema=schema, group=OrGroup).parse("whoosh is great")
    print(query)
    results = searcher.search(query)
    for r in results:
        print(r)

>>>Output:
>>>(content:whoosh OR content:great)
>>><Hit {'content': 'whoosh bla is bla great bla whoosh'}>
>>><Hit {'content': 'bla bla whoosh is great bla bla'}>
>>><Hit {'content': 'whoosh bla bla bla whoosh'}>

The current order is doc2, doc1, doc3 but should be doc1, doc2, doc3 instead

It seems like this should be easy to do but the closest thing i could find in the docs was orGroup Factory which sadly didn't help with the issue.

Thanks in advance!