quickwit-oss / tantivy

Tantivy is a full-text search engine library inspired by Apache Lucene and written in Rust
MIT License
12.09k stars 670 forks source link

Index out of bounds on conjunction of term and range queries #857

Closed ppodolsky closed 4 years ago

ppodolsky commented 4 years ago

Describe the bug

To Reproduce

I can provide on your request.

UPD.: Rebased on your master, the issue is still here.

ppodolsky commented 4 years ago

I've just realized that the issue could be at my side as I'm using my own top_collector :( I will check it. My own top collector is definitely same as yours, but it modifies score (multiplies it by value from fast_field).

ppodolsky commented 4 years ago

I've rewritten my top collector using tweak_score_top_collector, the bug is still here.

thread 'thrd-tantivy-search-43' panicked at 'index out of bounds: the len is 938132 but the index is 33554431', nexus/tantivy/src/common/bitset.rs:197:9
stack backtrace:
   0:     0x55cc785363a5 - backtrace::backtrace::libunwind::trace::h396c07d2071b43af
                               at /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.46/src/backtrace/libunwind.rs:86
   1:     0x55cc785363a5 - backtrace::backtrace::trace_unsynchronized::h7aa0e4bb23d9c158
                               at /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.46/src/backtrace/mod.rs:66
   2:     0x55cc785363a5 - std::sys_common::backtrace::_print_fmt::hd15ac5d4adcd355b
                               at src/libstd/sys_common/backtrace.rs:78
   3:     0x55cc785363a5 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::hec5354be8ccc3ecc
                               at src/libstd/sys_common/backtrace.rs:59
   4:     0x55cc7855f85c - core::fmt::write::h3d34909eeb4f225b
                               at src/libcore/fmt/mod.rs:1076
   5:     0x55cc7852f7d3 - std::io::Write::write_fmt::h1da287b3de55ed16
                               at src/libstd/io/mod.rs:1537
   6:     0x55cc78539060 - std::sys_common::backtrace::_print::h4d206838e1ace354
                               at src/libstd/sys_common/backtrace.rs:62
   7:     0x55cc78539060 - std::sys_common::backtrace::print::h1f778e9940ee5977
                               at src/libstd/sys_common/backtrace.rs:49
   8:     0x55cc78539060 - std::panicking::default_hook::{{closure}}::h704403a56cbf5783
                               at src/libstd/panicking.rs:198
   9:     0x55cc78538dac - std::panicking::default_hook::ha4567a10dec4ef8d
                               at src/libstd/panicking.rs:218
  10:     0x55cc78539697 - std::panicking::rust_panic_with_hook::h88a1f16ec8a7bb20
                               at src/libstd/panicking.rs:486
  11:     0x55cc7853929b - rust_begin_unwind
                               at src/libstd/panicking.rs:388
  12:     0x55cc7855d791 - core::panicking::panic_fmt::hbddb7fe6f399b81a
                               at src/libcore/panicking.rs:101
  13:     0x55cc7855d752 - core::panicking::panic_bounds_check::ha5d508118eb53f4e
                               at src/libcore/panicking.rs:73
  14:     0x55cc77abb638 - tantivy::common::bitset::BitSet::tinyset::hdd87f22745a4601b
  15:     0x55cc77c202d7 - tantivy::query::bitset::BitSetDocSet::go_to_bucket::h95d840c725e31291
  16:     0x55cc77c2077b - <tantivy::query::bitset::BitSetDocSet as tantivy::docset::DocSet>::seek::h8d724c51cd067383
  17:     0x55cc77b4e457 - <tantivy::query::scorer::ConstScorer<TDocSet> as tantivy::docset::DocSet>::seek::h83569c7dad68fa42
  18:     0x55cc77c99b59 - <alloc::boxed::Box<TDocSet> as tantivy::docset::DocSet>::seek::h9a83f6879d49a537
  19:     0x55cc77b1e455 - <tantivy::query::intersection::Intersection<TDocSet,TOtherDocSet> as tantivy::docset::DocSet>::advance::h695e5eace260a9d9
  20:     0x55cc77c99b21 - <alloc::boxed::Box<TDocSet> as tantivy::docset::DocSet>::advance::h322fda5b3ca6c42e
  21:     0x55cc77bcf732 - tantivy::query::union::refill::{{closure}}::hf5a373e45ad7ff93
  22:     0x55cc77bcd8f4 - tantivy::query::union::unordered_drain_filter::h6e7b68f0b22ab1c5
  23:     0x55cc77bcee8b - tantivy::query::union::refill::hea31c9e6bbf7ebe6
  24:     0x55cc77bcf94e - tantivy::query::union::Union<TScorer,TScoreCombiner>::refill::h98a3ae4aac0bd6e7
  25:     0x55cc77bd0798 - <tantivy::query::union::Union<TScorer,TScoreCombiner> as tantivy::docset::DocSet>::advance::h21ef922c6ca3d0c9
  26:     0x55cc77c21b36 - tantivy::query::weight::for_each_scorer::habb6c1ba998c7db9
  27:     0x55cc77b1c65e - <tantivy::query::boolean_query::boolean_weight::BooleanWeight as tantivy::query::weight::Weight>::for_each::h3fd88479d3732c8d
  28:     0x55cc7793698c - tantivy::collector::Collector::collect_segment::hb34a08877b4100b3
  29:     0x55cc778eba05 - tantivy::core::searcher::Searcher::search_with_executor::{{closure}}::hd1d71bd9d0a82cab
  30:     0x55cc779477f4 - tantivy::core::executor::Executor::map::{{closure}}::{{closure}}::h491f8d729ab9503a
  31:     0x55cc778b965a - rayon_core::scope::Scope::spawn::{{closure}}::{{closure}}::hfc071114a04b539e
  32:     0x55cc778f1d21 - <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once::hb92eff675012b026
  33:     0x55cc778bcc07 - std::panicking::try::do_call::h0c530d4775b07e10
  34:     0x55cc778d363d - __rust_try
  35:     0x55cc778bbefc - std::panicking::try::h809350467d037880
  36:     0x55cc778f5443 - std::panic::catch_unwind::h0af775eeffca1588
  37:     0x55cc77938af0 - rayon_core::unwind::halt_unwinding::he5904607d2b275c5
  38:     0x55cc778b9af3 - rayon_core::scope::ScopeBase::execute_job_closure::h0ba2e5b563bb952e
  39:     0x55cc778b99b3 - rayon_core::scope::ScopeBase::execute_job::h7661885e2b0ab21c
  40:     0x55cc778b9550 - rayon_core::scope::Scope::spawn::{{closure}}::h9047f14fb9349417
  41:     0x55cc7788c270 - <rayon_core::job::HeapJob<BODY> as rayon_core::job::Job>::execute::hadee723bd17e74fd
  42:     0x55cc77e936fa - rayon_core::job::JobRef::execute::he05b3a828cf3180c
  43:     0x55cc77e87950 - rayon_core::registry::WorkerThread::execute::h39978e106fa9aee3
  44:     0x55cc77e876e9 - rayon_core::registry::WorkerThread::wait_until_cold::hddc9141b38d23114
  45:     0x55cc77e874d1 - rayon_core::registry::WorkerThread::wait_until::h1f071a720f74886b
  46:     0x55cc77e880ec - rayon_core::registry::main_loop::h0779a594efce4596
  47:     0x55cc77e85660 - rayon_core::registry::ThreadBuilder::run::h0f55c4581c3ef739
  48:     0x55cc77e85b41 - <rayon_core::registry::DefaultSpawn as rayon_core::registry::ThreadSpawn>::spawn::{{closure}}::h260282579e9cdd9a
  49:     0x55cc77e9eda0 - std::sys_common::backtrace::__rust_begin_short_backtrace::hdfa45e30ca774009
  50:     0x55cc77e973c1 - std::thread::Builder::spawn_unchecked::{{closure}}::{{closure}}::hc00c1b2a4bfda637
  51:     0x55cc77e889c3 - <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once::he33d631fa919e4e4
  52:     0x55cc77e8ba6a - std::panicking::try::do_call::ha8f9f82dcfa1aca3
  53:     0x55cc77e8cbad - __rust_try
  54:     0x55cc77e8b8dc - std::panicking::try::hb23236c4c25e74d1
  55:     0x55cc77e894a3 - std::panic::catch_unwind::h6eb96cb948319564
  56:     0x55cc77e971bb - std::thread::Builder::spawn_unchecked::{{closure}}::hb1d8e65d25dc32b5
  57:     0x55cc77e8f337 - core::ops::function::FnOnce::call_once{{vtable.shim}}::h1fa125ec4e37baf4
  58:     0x55cc7853ddea - <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once::hcf205bcf9b46c587
                               at /rustc/5c1f21c3b82297671ad3ae1e8c942d2ca92e84f2/src/liballoc/boxed.rs:1076
  59:     0x55cc7853ddea - <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once::h2d53e2246128f5d8
                               at /rustc/5c1f21c3b82297671ad3ae1e8c942d2ca92e84f2/src/liballoc/boxed.rs:1076
  60:     0x55cc7853ddea - std::sys::unix::thread::Thread::new::thread_start::h3b6d8a0cd87a87c6
                               at src/libstd/sys/unix/thread.rs:87
  61:     0x7f79d7d0b609 - start_thread
  62:     0x7f79d7c17103 - __clone
  63:                0x0 - <unknown>
fulmicoton commented 4 years ago

Do I have code you can share to help me reproduce?

On Thu, Aug 6, 2020 at 4:08 AM Pasha Podolsky notifications@github.com wrote:

I've just rewrite top collector using tweak_score_top_collector, the bug is still here.

thread 'thrd-tantivy-search-43' panicked at 'index out of bounds: the len is 938132 but the index is 33554431', nexus/tantivy/src/common/bitset.rs:197:9 stack backtrace: 0: 0x55cc785363a5 - backtrace::backtrace::libunwind::trace::h396c07d2071b43af at /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.46/src/backtrace/libunwind.rs:86 1: 0x55cc785363a5 - backtrace::backtrace::trace_unsynchronized::h7aa0e4bb23d9c158 at /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.46/src/backtrace/mod.rs:66 2: 0x55cc785363a5 - std::sys_common::backtrace::_print_fmt::hd15ac5d4adcd355b at src/libstd/sys_common/backtrace.rs:78 3: 0x55cc785363a5 - ::fmt::hec5354be8ccc3ecc at src/libstd/sys_common/backtrace.rs:59 4: 0x55cc7855f85c - core::fmt::write::h3d34909eeb4f225b at src/libcore/fmt/mod.rs:1076 5: 0x55cc7852f7d3 - std::io::Write::write_fmt::h1da287b3de55ed16 at src/libstd/io/mod.rs:1537 6: 0x55cc78539060 - std::sys_common::backtrace::_print::h4d206838e1ace354 at src/libstd/sys_common/backtrace.rs:62 7: 0x55cc78539060 - std::sys_common::backtrace::print::h1f778e9940ee5977 at src/libstd/sys_common/backtrace.rs:49 8: 0x55cc78539060 - std::panicking::default_hook::{{closure}}::h704403a56cbf5783 at src/libstd/panicking.rs:198 9: 0x55cc78538dac - std::panicking::default_hook::ha4567a10dec4ef8d at src/libstd/panicking.rs:218 10: 0x55cc78539697 - std::panicking::rust_panic_with_hook::h88a1f16ec8a7bb20 at src/libstd/panicking.rs:486 11: 0x55cc7853929b - rust_begin_unwind at src/libstd/panicking.rs:388 12: 0x55cc7855d791 - core::panicking::panic_fmt::hbddb7fe6f399b81a at src/libcore/panicking.rs:101 13: 0x55cc7855d752 - core::panicking::panic_bounds_check::ha5d508118eb53f4e at src/libcore/panicking.rs:73 14: 0x55cc77abb638 - tantivy::common::bitset::BitSet::tinyset::hdd87f22745a4601b 15: 0x55cc77c202d7 - tantivy::query::bitset::BitSetDocSet::go_to_bucket::h95d840c725e31291 16: 0x55cc77c2077b - ::seek::h8d724c51cd067383 17: 0x55cc77b4e457 - <tantivy::query::scorer::ConstScorer as tantivy::docset::DocSet>::seek::h83569c7dad68fa42 18: 0x55cc77c99b59 - <alloc::boxed::Box as tantivy::docset::DocSet>::seek::h9a83f6879d49a537 19: 0x55cc77b1e455 - <tantivy::query::intersection::Intersection<TDocSet,TOtherDocSet> as tantivy::docset::DocSet>::advance::h695e5eace260a9d9 20: 0x55cc77c99b21 - <alloc::boxed::Box as tantivy::docset::DocSet>::advance::h322fda5b3ca6c42e 21: 0x55cc77bcf732 - tantivy::query::union::refill::{{closure}}::hf5a373e45ad7ff93 22: 0x55cc77bcd8f4 - tantivy::query::union::unordered_drain_filter::h6e7b68f0b22ab1c5 23: 0x55cc77bcee8b - tantivy::query::union::refill::hea31c9e6bbf7ebe6 24: 0x55cc77bcf94e - tantivy::query::union::Union<TScorer,TScoreCombiner>::refill::h98a3ae4aac0bd6e7 25: 0x55cc77bd0798 - <tantivy::query::union::Union<TScorer,TScoreCombiner> as tantivy::docset::DocSet>::advance::h21ef922c6ca3d0c9 26: 0x55cc77c21b36 - tantivy::query::weight::for_each_scorer::habb6c1ba998c7db9 27: 0x55cc77b1c65e - ::for_each::h3fd88479d3732c8d 28: 0x55cc7793698c - tantivy::collector::Collector::collect_segment::hb34a08877b4100b3 29: 0x55cc778eba05 - tantivy::core::searcher::Searcher::search_with_executor::{{closure}}::hd1d71bd9d0a82cab 30: 0x55cc779477f4 - tantivy::core::executor::Executor::map::{{closure}}::{{closure}}::h491f8d729ab9503a 31: 0x55cc778b965a - rayon_core::scope::Scope::spawn::{{closure}}::{{closure}}::hfc071114a04b539e 32: 0x55cc778f1d21 - <std::panic::AssertUnwindSafe as core::ops::function::FnOnce<()>>::call_once::hb92eff675012b026 33: 0x55cc778bcc07 - std::panicking::try::do_call::h0c530d4775b07e10 34: 0x55cc778d363d - __rust_try 35: 0x55cc778bbefc - std::panicking::try::h809350467d037880 36: 0x55cc778f5443 - std::panic::catch_unwind::h0af775eeffca1588 37: 0x55cc77938af0 - rayon_core::unwind::halt_unwinding::he5904607d2b275c5 38: 0x55cc778b9af3 - rayon_core::scope::ScopeBase::execute_job_closure::h0ba2e5b563bb952e 39: 0x55cc778b99b3 - rayon_core::scope::ScopeBase::execute_job::h7661885e2b0ab21c 40: 0x55cc778b9550 - rayon_core::scope::Scope::spawn::{{closure}}::h9047f14fb9349417 41: 0x55cc7788c270 - <rayon_core::job::HeapJob as rayon_core::job::Job>::execute::hadee723bd17e74fd 42: 0x55cc77e936fa - rayon_core::job::JobRef::execute::he05b3a828cf3180c 43: 0x55cc77e87950 - rayon_core::registry::WorkerThread::execute::h39978e106fa9aee3 44: 0x55cc77e876e9 - rayon_core::registry::WorkerThread::wait_until_cold::hddc9141b38d23114 45: 0x55cc77e874d1 - rayon_core::registry::WorkerThread::wait_until::h1f071a720f74886b 46: 0x55cc77e880ec - rayon_core::registry::main_loop::h0779a594efce4596 47: 0x55cc77e85660 - rayon_core::registry::ThreadBuilder::run::h0f55c4581c3ef739 48: 0x55cc77e85b41 - ::spawn::{{closure}}::h260282579e9cdd9a 49: 0x55cc77e9eda0 - std::sys_common::backtrace::rust_begin_short_backtrace::hdfa45e30ca774009 50: 0x55cc77e973c1 - std::thread::Builder::spawn_unchecked::{{closure}}::{{closure}}::hc00c1b2a4bfda637 51: 0x55cc77e889c3 - <std::panic::AssertUnwindSafe as core::ops::function::FnOnce<()>>::call_once::he33d631fa919e4e4 52: 0x55cc77e8ba6a - std::panicking::try::do_call::ha8f9f82dcfa1aca3 53: 0x55cc77e8cbad - rust_try 54: 0x55cc77e8b8dc - std::panicking::try::hb23236c4c25e74d1 55: 0x55cc77e894a3 - std::panic::catch_unwind::h6eb96cb948319564 56: 0x55cc77e971bb - std::thread::Builder::spawn_unchecked::{{closure}}::hb1d8e65d25dc32b5 57: 0x55cc77e8f337 - core::ops::function::FnOnce::call_once{{vtable.shim}}::h1fa125ec4e37baf4 58: 0x55cc7853ddea - <alloc::boxed::Box as core::ops::function::FnOnce>::call_once::hcf205bcf9b46c587 at /rustc/5c1f21c3b82297671ad3ae1e8c942d2ca92e84f2/src/liballoc/boxed.rs:1076 59: 0x55cc7853ddea - <alloc::boxed::Box as core::ops::function::FnOnce>::call_once::h2d53e2246128f5d8 at /rustc/5c1f21c3b82297671ad3ae1e8c942d2ca92e84f2/src/liballoc/boxed.rs:1076 60: 0x55cc7853ddea - std::sys::unix::thread::Thread::new::thread_start::h3b6d8a0cd87a87c6 at src/libstd/sys/unix/thread.rs:87 61: 0x7f79d7d0b609 - start_thread 62: 0x7f79d7c17103 - __clone 63: 0x0 -

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/tantivy-search/tantivy/issues/857#issuecomment-669413548, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHZMQRIGGNSM2JXSY7ODGLR7GU37ANCNFSM4PTQOASQ .

ppodolsky commented 4 years ago

Here it is: Cargo.toml

[package]

name = "hello_world"
version = "0.0.1"
authors = [ "Your name <you@example.com>" ]

[dependencies]
tantivy = { git = "https://github.com/tantivy-search/tantivy", branch = "master"}
tempdir = "0.3.7"

main.rs

#[macro_use]
extern crate tantivy;
extern crate tempdir;
use tantivy::collector::TopDocs;
use tantivy::query::QueryParser;
use tantivy::schema::*;
use tantivy::Index;
use tantivy::ReloadPolicy;
use tempdir::TempDir;

fn main() -> tantivy::Result<()> {
    let index_path = TempDir::new("tantivy_example_dir")?;
    let mut schema_builder = Schema::builder();
    schema_builder.add_text_field("title", TEXT | STORED);
    schema_builder.add_i64_field("year", FAST | INDEXED | STORED);
    let schema = schema_builder.build();
    let index = Index::create_in_dir(&index_path, schema.clone())?;
    let mut index_writer = index.writer(50_000_000)?;
    let title = schema.get_field("title").unwrap();
    let year = schema.get_field("year").unwrap();

    index_writer.add_document(doc!(
        title => "hemoglobin blood",
        year => 1990 as i64
    ));

    index_writer.commit()?;

    let reader = index
        .reader_builder()
        .reload_policy(ReloadPolicy::OnCommit)
        .try_into()?;

    let searcher = reader.searcher();
    let query_parser = QueryParser::for_index(&index, vec![title]);
    let query = query_parser.parse_query("hemoglobin AND year:[1970 TO 1990]")?;
    let top_docs = searcher.search(&query, &TopDocs::with_limit(10))?;
    for (_score, doc_address) in top_docs {
        let retrieved_doc = searcher.doc(doc_address)?;
        println!("{}", schema.to_json(&retrieved_doc));
    }
    Ok(())
}

But everything works on tantivy = 0.12.0

ppodolsky commented 4 years ago

Bisection showed that this behavior has been introduced in e25284bafe76622cc075f015f3dd009cbb2bab11

fulmicoton commented 4 years ago

Thanks ! Yes there has been massive change in master. Your test will be very useful.

On Fri, Aug 7, 2020 at 2:37 PM Pasha Podolsky notifications@github.com wrote:

Bisection showed that this behavior has been introduced in e25284b https://github.com/tantivy-search/tantivy/commit/e25284bafe76622cc075f015f3dd009cbb2bab11

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/tantivy-search/tantivy/issues/857#issuecomment-670335569, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHZMQSI7TEOYG2LUOXUHB3R7OHKTANCNFSM4PTQOASQ .

fulmicoton commented 4 years ago

The fix is on the way. It should be merged in master today.

On Fri, Aug 7, 2020 at 4:13 PM Paul Masurel paul.masurel@gmail.com wrote:

Thanks ! Yes there has been massive change in master. Your test will be very useful.

On Fri, Aug 7, 2020 at 2:37 PM Pasha Podolsky notifications@github.com wrote:

Bisection showed that this behavior has been introduced in e25284b https://github.com/tantivy-search/tantivy/commit/e25284bafe76622cc075f015f3dd009cbb2bab11

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/tantivy-search/tantivy/issues/857#issuecomment-670335569, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHZMQSI7TEOYG2LUOXUHB3R7OHKTANCNFSM4PTQOASQ .

ppodolsky commented 4 years ago

Awesome!

fulmicoton commented 4 years ago

Sorry not delivered yet. I'm waiting for CI to finish

On Wed, Aug 12, 2020 at 3:50 PM Pasha Podolsky notifications@github.com wrote:

Awesome!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/tantivy-search/tantivy/issues/857#issuecomment-672645829, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHZMQRHGA5T6BCOUM2ZZW3SAI3SXANCNFSM4PTQOASQ .

ppodolsky commented 4 years ago

Yep, I'm just spying on the opened PR, do not worry.