[Feature]: Support vector list

xiaofan-luan commented 1 year ago

Is there an existing issue for this?

[X] I have searched the existing issues

Is your feature request related to a problem? Please describe.

For many use scenarios, users simply treat each entity as a embedding list. For example, under chatPDF use cases, each PDF is split into chunks, and each chunk embedded into a vector. It would be similar to agent scenario, where each agent has limited vectors.

In that case, we build index for each entities, not for the whole segment.

Data model

Primary Key -> DocID Partition Key -> UserID Scalars -> Type, Author .... Vector -> Array [Chunk embeddings]

Search:

given docid, find related chunks -> no index, simply brute force search
given userID, find related chunks -> index, filtering
find top10 global realted chunks, groupby docID or userID or Author

Describe the solution you'd like.

Add a new data type -> array of vectors
support search like where select top 10 from embedding list where docID = xxx
support search like where select top 10 from embedding list where userID = xxx groupby docID
support search like where select top 10 from embedding list groupby author

Describe an alternate solution.

No response

Anything else? (Additional Context)

No response

xiaofan-luan commented 1 year ago

/assign @SimFG /assign @xiaocai2333 /assign @czs007

Let's discuss on it!

xiaofan-luan commented 1 year ago

by the way, each embedding list should have a limit of 4096 or 1024 vectors.
we don't support partial update for now

milvus-io / milvus