milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
29.78k stars 2.86k forks source link

[Feature]: Support vector list #26956

Open xiaofan-luan opened 1 year ago

xiaofan-luan commented 1 year ago

Is there an existing issue for this?

Is your feature request related to a problem? Please describe.

For many use scenarios, users simply treat each entity as a embedding list. For example, under chatPDF use cases, each PDF is split into chunks, and each chunk embedded into a vector. It would be similar to agent scenario, where each agent has limited vectors.

In that case, we build index for each entities, not for the whole segment.

Data model

Primary Key -> DocID Partition Key -> UserID Scalars -> Type, Author .... Vector -> Array [Chunk embeddings]

Search:

  1. given docid, find related chunks -> no index, simply brute force search
  2. given userID, find related chunks -> index, filtering
  3. find top10 global realted chunks, groupby docID or userID or Author

Describe the solution you'd like.

  1. Add a new data type -> array of vectors
  2. support search like where select top 10 from embedding list where docID = xxx
  3. support search like where select top 10 from embedding list where userID = xxx groupby docID
  4. support search like where select top 10 from embedding list groupby author

Describe an alternate solution.

No response

Anything else? (Additional Context)

No response

xiaofan-luan commented 1 year ago

/assign @SimFG /assign @xiaocai2333 /assign @czs007

Let's discuss on it!

xiaofan-luan commented 1 year ago
  1. by the way, each embedding list should have a limit of 4096 or 1024 vectors.
  2. we don't support partial update for now