quickwit-oss / tantivy

Tantivy is a full-text search engine library inspired by Apache Lucene and written in Rust
MIT License
12.06k stars 670 forks source link

Range queries on JSON fields #1709

Open xrl opened 1 year ago

xrl commented 1 year ago

Is your feature request related to a problem? Please describe.

I cannot perform range queries on JSON fields. For example, the examples/json_field.rs has a search like so:

    {
        let query = query_parser.parse_query("cart.product_id:103")?;
        let count_docs = searcher.search(&*query, &Count)?;
        assert_eq!(count_docs, 1);
    }

But I cannot rewrite the query like this:

    {
        let query = query_parser.parse_query("cart.product_id < 110")?;
        let count_docs = searcher.search(&*query, &Count)?;
        assert_eq!(count_docs, 1);
    }

the count comes back 0.

Describe the solution you'd like

We should be able to search nested JSON documents with the usual <, >, etc.

adamreichold commented 1 year ago

Isn't the syntax for range queries supported by the built-in QueryParser different, meaning it would be something like

cart.product_id:[0 TO 110}

(assuming that 0 is the smallest possible ID).

PSeitz commented 1 year ago

The parser doesn't handle this currently, but this should work cart.product_id:<110 or cart.product_id:{* TO 110}

Related quickwit issue: https://github.com/quickwit-oss/quickwit/issues/2431

Nickersoft commented 1 year ago

@PSeitz Attempting the :< syntax of :{{* TO 110}} (Rust is complaining about the unescaped {}, thus the double-brackets) returns Unsupported query: Range query are not supported on json field for me.

PSeitz commented 1 year ago

Indeed, it's disabled. I don't think there's a inherent reason, except some code missing to handle that. @fulmicoton?

fulmicoton commented 1 year ago

If people want to contribute?

yollotltamayo commented 1 year ago

Hi @fulmicoton I would like to contribute, if you can give any entry point to start working on this, thanks.

PSeitz commented 1 year ago

@yollotltamayo

https://github.com/quickwit-oss/tantivy/tree/main/src/query/range_query

The code is sometimes bound to a fixed field in the schema. These would need to be replaced with something that can handle JSON, e.g. "myjson.fielda" https://github.com/quickwit-oss/tantivy/blob/main/src/query/range_query/range_query.rs#L335

A range query can run on the columnar storage and on the inverted index. I would implement it first for the inverted index, as it should be simpler.

let me know if your are still interested

ppodolsky commented 1 year ago

I've implemented similar ExistsQuery for jsons, feel free to port it back: https://github.com/izihawa/summa/blob/master/summa-core/src/components/queries/exists_query.rs#L90

michel-kraemer commented 5 months ago

Here's my custom implementation of a range query for JSON fields for anyone interested: https://github.com/georocket/georocket/blob/bac0325889d43f93389a54327b90338527ef03c2/rust/core/src/index/tantivy/json_range_query.rs

It's basically the same code as that of Tantivy's range query. I just changed the type of field from String to Field.

xiaofan-luan commented 1 week ago

is there a plan to support range on json in the next release? what about other complicated queries like regex or fuzzy?

PSeitz commented 1 week ago

Range queries on JSON fast fields (columnar storage) are supported now https://github.com/quickwit-oss/tantivy/pull/2456

Range queries on JSON on the inverted index are not yet supported.

xiaofan-luan commented 1 week ago

thanks for the updates!