quickwit-oss / tantivy

Tantivy is a full-text search engine library inspired by Apache Lucene and written in Rust
MIT License
12.14k stars 672 forks source link

QueryAST #1792

Open fulmicoton opened 1 year ago

fulmicoton commented 1 year ago

Currently tantivy's query are based on the following traits.

- trait Query
- trait Weight
- trait Scorer

They do not expose the structure of the query, and make it easy to extend queries.

It might be interesting however to introduce a QueryAST

enum QueryAST {
   TermQuery(Box<TermQuery>),
   Range(Box<RangeQuery>) 
   Boolean(BooleanQueryAST), //< or somethingelse
   ...
   Other(Box<dyn Query>)
}

And the equivalent for weight...

Such an AST could help debug, optimization operations, and could be a natural target for different query DSL.

fulmicoton commented 1 year ago

@guilload @evanxg852000 I'd like to have your thought on this?

ppodolsky commented 1 year ago

Just passing by, but it would be cool. Almost in every place I was working with search engines, there were kind of Query DSL. In Summa, I also have proto-based query tree: https://github.com/izihawa/summa/blob/master/summa-proto/proto/query.proto#L14

fulmicoton commented 1 year ago

https://github.com/quickwit-oss/quickwit/issues/1655

PSeitz commented 4 months ago

This would make some optimizations easier, e.g. for

(Field1:Term1 OR Field1:Term2) AND (Field2: Term1 OR Field2:Term2), it would be better to use a simple union-algorithm that supports fast skips instead of the current one.

For that we would need to know that above the union is an intersection that triggers the skips. In the current generic API it's possible, but strange to pass down that information.