whoosh-community / whoosh

Whoosh is a fast, featureful full-text indexing and searching library implemented in pure Python.
Other
246 stars 37 forks source link

Infinite loop in UnionMatcher#skip_to_quality #446

Open fortable1999 opened 8 years ago

fortable1999 commented 8 years ago

Original report by Stefan Andersen (Bitbucket: stfn, GitHub: stfn).


I'm hitting an infinite loop - and it seems to be stuck around skip_to_quality.

My stack trace dumped while running:


  File "./matcher.py", line 45, in <module>
    main()
  File "./matcher.py", line 36, in main
    matcher.process(1)
  File "/home/vagrant/dixie/sherlock/matcher/matcher.py", line 58, in process
    self.search(company_id, transaction)
  File "/home/vagrant/dixie/sherlock/matcher/matcher.py", line 44, in search
    transaction.description
  File "/home/vagrant/dixie/sherlock/indexer/searcher.py", line 41, in search_transaction
    return self.search(company_id, query)
  File "/home/vagrant/dixie/sherlock/indexer/searcher.py", line 17, in search
    results = searcher.search(term_q, filter=allow_q, terms=True)
  File "/home/vagrant/.virtualenvs/sherlock/local/lib/python2.7/site-packages/whoosh/searching.py", line 786, in search
    self.search_with_collector(q, c)
  File "/home/vagrant/.virtualenvs/sherlock/local/lib/python2.7/site-packages/whoosh/searching.py", line 819, in search_with_collector
    collector.run()
  File "/home/vagrant/.virtualenvs/sherlock/local/lib/python2.7/site-packages/whoosh/collectors.py", line 144, in run
    self.collect_matches()
  File "/home/vagrant/.virtualenvs/sherlock/local/lib/python2.7/site-packages/whoosh/collectors.py", line 737, in collect_matches
    for sub_docnum in child.matches():
  File "/home/vagrant/.virtualenvs/sherlock/local/lib/python2.7/site-packages/whoosh/collectors.py", line 409, in matches
    self.skipped_times += matcher.skip_to_quality(minscore)
  File "/home/vagrant/.virtualenvs/sherlock/local/lib/python2.7/site-packages/whoosh/matching/binary.py", line 769, in skip_to_quality
    return a.skip_to_quality(minquality)
  File "/home/vagrant/.virtualenvs/sherlock/local/lib/python2.7/site-packages/whoosh/matching/binary.py", line 291, in skip_to_quality
    skipped += a.skip_to_quality(minquality - bq)```

I can provide dataset in private - since it can contain some sensitive information.
fortable1999 commented 8 years ago

Original comment by Leiming Xu (Bitbucket: padding, GitHub: padding).


I also met this issue. After inspected variable values, I conclude that it is the problem of floating point precision. In function skip_to_quality, file "binary.py", because of limited folating point precision, "aq + bq <= minquality" and "aq > minquality - bq"
may be true at the same time if aq, bq and minquality happen to be specific values. This will cause infinite loop. @Matt, hope this helpful to you.

stevennic commented 5 years ago

Commit #519 should resolve this issue.