whoosh-community / whoosh

Whoosh is a fast, featureful full-text indexing and searching library implemented in pure Python.
Other
240 stars 36 forks source link

TimeLimitCollector breaks filter of underlying FilterCollector #567

Open richardebeling opened 3 years ago

richardebeling commented 3 years ago

As far as I understand the documentation, a TimeLimitCollector can be used to simply wrap any other collector in order to add a time-out feature. It should not affect the results in any way.

However, in a code base I'm working on, adding a TimeLimitCollector breaks the filtering of an underlying FilterCollector

This happens on the current PyPI release (2.7.4) as well as the current master (5421f1ab3bb802114105b3181b7ce4f44ad7d0bb).

Minimal example to reproduce:

#!/usr/bin/env python3
import os

from whoosh.fields import Schema, TEXT, ID
from whoosh.filedb.filestore import FileStorage
from whoosh.query import Term
from whoosh.qparser import QueryParser
from whoosh.collectors import TimeLimitCollector

if not os.path.exists("test_index"):
    os.makedirs("test_index")

schema = Schema(id=ID(stored=True), text=TEXT(stored=True))
storage = FileStorage("test_index")
ix = storage.create_index(schema)
writer = ix.writer()
writer.add_document(text=u"test", id="1")
writer.commit()

query_parser = QueryParser("text", schema)
query = query_parser.parse("test")
id_term = Term("id", "0")

with ix.searcher() as searcher:
    collector = searcher.collector(limit=10, filter=id_term)

    # Add this line and the assertion will fail
    # collector = TimeLimitCollector(collector, timelimit=5.0)

    searcher.search_with_collector(query, collector)

    print(collector.results())
    assert(len(collector.results()) == 0)

Did I misunderstand something or is this a bug?