whoosh-community / whoosh

Whoosh is a fast, featureful full-text indexing and searching library implemented in pure Python.
Other
252 stars 37 forks source link

search results with ANDMAYBE affected by limit #122

Closed fortable1999 closed 13 years ago

fortable1999 commented 13 years ago

Original report by Anonymous.


Following up on my earlier email: http://groups.google.com/group/whoosh/t/3064d3888f7a240b

Some top search results (by score) disappear when using ANDMAYBE to join queries and using a search limit. When limit is None, the results are accurate, but as limit is reduced, the correct results begin to disappear.

The following code demonstrates the issue:

#!python

from whoosh import index
from whoosh.ramindex import RamIndex
from whoosh.qparser import QueryParser
from whoosh.fields import *

ABCD = u'Alpha Bravo Charlie Delta'
EBF  = u'Echo Bravo Foxtrot'
BGH  = u'Bravo Golf Hotel'
BI   = u'Bravo India'
JKB  = u'Juliet Kilo Bravo'
LBM  = u'Lima Bravo Mike'

schema = Schema(id = STORED,
                title = TEXT(stored = True),
                year = NUMERIC)

idx = RamIndex(schema)
w = idx.writer()

for id, title, year in zip(range(6), 
                           [ABCD, EBF, BGH, BI, JKB, LBM],
                           ['2000', '2000', '2002', '2002', '2004', '2004']):
    w.add_document(id = id, title = title, year = year)

s = idx.searcher()
qp = QueryParser('title', idx.schema)

query = u'title:(Bravo) ANDMAYBE year:2004'
titles = [r['title'] for r in s.search(qp.parse(query), limit = None)[:2]]
assert (JKB in titles and LBM in titles)

titles = [r['title'] for r in s.search(qp.parse(query), limit = 2)]
assert (JKB in titles and LBM in titles)

This example is with a RamIndex for simplicity, but a regular file based index fails in the same way.

fortable1999 commented 13 years ago

Original comment by Matt Chaput (Bitbucket: mchaput, GitHub: mchaput).


Fixed bug in IntersectionMatcher.skip_to_quality() where it wasn't checking if the submatcher was still active. Fixes issue #121. Fixed bug in AndMaybeMatcher where it needed to override the quality() method. Fixes issue #122. Thanks Jeremy!