quickwit-oss / tantivy

Tantivy is a full-text search engine library inspired by Apache Lucene and written in Rust
MIT License
12.03k stars 670 forks source link

Use TermSetQuery when building the Query from the user input. #1568

Open fmassot opened 2 years ago

fmassot commented 2 years ago

1539 introduces TermSetQuery which is a nice addition to the already supported query types.

In Quickwit, we rely on tantivy query grammar and then we use tantivy query parser to build a Box<dyn Query>. The query parser does not currently use the TermSetQuery so we can't benefit from it in Quickwit.

I'm not sure how to handle that, either in Quickwit or in tantivy. In tantivy I see 2 possible solutions:

I like the first solution as it would provide a nice way for the user to express this type of query. Any thoughts on this @fulmicoton?

PSeitz commented 2 years ago

We could consider to support more complex expressions on fields by parsing a subtree.

my_field_name:(a OR b OR c)

Currently this is parsed as a phrase "(a OR b OR c)". The way to express this currently is:

my_field_name:a OR my_field_name:b OR my_field_name:c

Both would require post-processing though. We should check that the parser can handle thousands of terms.

Seems to be fast enough with linear complexity:

field:term1 OR field:term2 OR field:term3 ... 

running 3 tests
test tests::bench_100_000_terms ... bench: 127,676,714 ns/iter (+/- 1,687,208)
test tests::bench_10_000_terms  ... bench:  12,545,798 ns/iter (+/- 232,466)
test tests::bench_1_000_terms   ... bench:   1,202,724 ns/iter (+/- 16,644)