milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
29.42k stars 2.82k forks source link

[Feature]: Support index build and search on JSON/Dynamic field #35528

Open xiaofan-luan opened 4 weeks ago

xiaofan-luan commented 4 weeks ago

Is there an existing issue for this?

Is your feature request related to a problem? Please describe.

Filtering on JSON and dynamic speed could be slow. Although we've already integrate on SimdJson, it is still slow under many of the use cases, especially when the json contains many field. We need to build tantivy index on top of it, docuemnt https://github.com/quickwit-oss/tantivy/blob/main/doc/src/json.md

Describe the solution you'd like.

ES support two modes for JSON files, flatten mode https://www.elastic.co/guide/en/elasticsearch/reference/8.8/nested.html and hierarchical mode https://www.elastic.co/guide/en/elasticsearch/reference/8.8/object.html.

Currently tantivy seems to be more on the flatten mode, so there might be issue here, for example

  1. lose hierachy,which means some unmatched data will be searched
  2. range query is not supported
  3. missing dataype

see details at https://github.com/quickwit-oss/tantivy/blob/main/doc/src/json.md

Describe an alternate solution.

No response

Anything else? (Additional Context)

No response

xiaofan-luan commented 4 weeks ago

/assign @sunby could you start to investigate on it

xiaofan-luan commented 4 weeks ago

@longjiquan can help

xiaofan-luan commented 4 weeks ago

We want to support real nested field indexing

image

Basically what we do is to split arrays of docs into multiple docs. this will bring extra cost. what we can also to is to build a bitmap index on each array, and translate a filter into json match && array index match