milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
30.33k stars 2.91k forks source link

[Feature]: Support insert texts into Milvus directly and use BM25 to search #35853

Open zhengbuqian opened 2 months ago

zhengbuqian commented 2 months ago

Is there an existing issue for this?

Is your feature request related to a problem? Please describe.

Currently in order to perform BM25 based text relevance search using vector ANN search, we have to :

  1. Gather the entire corpus to collect the data statistics, including term frequency and inverse document frequency, etc
  2. Compute the doc embeddings and insert those into Milvus as sparse embeddings
  3. Compute the query embeddings and search using IP metric

This approach is not good enough and hard to update when the corpus has been updated a lot.

We propose a new way of doing such: allowing inserting texts only and have Milvus to maintain the statistics and do the conversion at runtime.

Proposed approaches and APIs will be shared shortly.

This will be the umbrella issue for all related following issues and PRs.

Describe the solution you'd like.

No response

Describe an alternate solution.

No response

Anything else? (Additional Context)

No response

xiaofan-luan commented 2 months ago

excited about this!