milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
29.4k stars 2.82k forks source link

[Feature]: Join two collections #35500

Open xiaofan-luan opened 4 weeks ago

xiaofan-luan commented 4 weeks ago

Is there an existing issue for this?

Is your feature request related to a problem? Please describe.

Under some use cases, user need to search for topk for each entity of the other collections.

This can be called as a Knn Join or semantic join.

Simply list it here and wait for more discussion

Describe the solution you'd like.

No response

Describe an alternate solution.

No response

Anything else? (Additional Context)

No response

chasingegg commented 4 weeks ago

We could have something like batching search in vector search engine, this is helpful when we use IVF related indexes, we can group the same posting lists for different queries and do the matrix computation to improve qps.

xiaofan-luan commented 4 weeks ago

That is exactly what I'm thinking. To implement this, we need

  1. LRU on segments (usuaully we don't need to load everything into main memory)
  2. Batch search on all segments (typically NQ == 100k)
  3. Using GPU or other batch optimizations in index. Under this mode, we don't really need to do batch insertion
xiaofan-luan commented 4 weeks ago

@liliu-z @chasingegg thoughts on it?

liliu-z commented 3 weeks ago
  1. An async/cron job API is needed.
  2. It is a general operation that can apply to any indexes and cache strategies (Segment LRU, all Memory, etc.). But we have some prefer combination.
  3. It can be a Map-Reduce pattern, we first do batch searches and store results on a cronjob leader node (maybe delegator). And then do a reduce work upon it.