mchaput / whoosh

Pure-Python full-text search library
Other
569 stars 69 forks source link

Long Term Query (OR) issues #25

Closed nbhoj-cedar closed 10 months ago

nbhoj-cedar commented 2 years ago

I am generating a long query as follows :

for i in range(16100): #this is put 16100 approx the number of values im trying to use
    terms.append(Term("content", "first"))
query = Or(terms)

I figured out that the number of Terms that i can add is only 1023. Is there a way to up this? i have around ~16100 possible values. Anything above 1024,hit.matched_terms() just responds an empty array - []

mchaput commented 1 year ago

On Apr 12, 2022, at 12:54 PM, nbhoj-cedar @.***> wrote:

I am generating a long query as follows :

for i in range(16100): #this is put 16100 approx the number of values im trying to use terms.append(Term("content", "first")) query = Or(terms)

I figured out that the number of Terms that i can add is only 1023. Is there a way to up this? i have around ~16100 possible values. Anything above 1024, hit.matched_terms() just responds an empty array - []

It works for me:

def test_many_or_subclauses(): schema = fields.Schema(body=fields.TEXT) with TempIndex(schema) as ix: with ix.writer() as w: w.add_document(body=u"alfa bravo charlie delta") w.add_document(body=u"lima mike november oskar")

Search should find this (doc #2):

        w.add_document(body=u"echo foxtrot 14800 hotel")
        w.add_document(body=u"golf india juliet kilo")

    with ix.searcher() as s:
        qs = []
        for i in range(15000):
            qs.append(query.Term("body", u"%s" % i))
        q = query.Or(qs)

        r = s.search(q)
        assert len(r) == 1
        assert r[0].docnum == 2

But at this point there might be fixes in the repo that haven't been released :(