milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
28.4k stars 2.74k forks source link

[Feature]: Support primary key dedup and vector dedup when insert. #31552

Open xiaofan-luan opened 4 months ago

xiaofan-luan commented 4 months ago

Is there an existing issue for this?

Is your feature request related to a problem? Please describe.

When user insert, duplicated primary key or bloom filter need to be returned error or idempotent.

This means:

  1. this is a read after write .
  2. read check if there is an exactly match on PK or vector
  3. we might need a bloom filter on vector as well.
  4. this is the only cases we need to query vector, which is the dedup check we may need a hash on top of the query vector for fast query

Describe the solution you'd like.

No response

Describe an alternate solution.

No response

Anything else? (Additional Context)

No response

yiwen92 commented 3 months ago

why not consider read before write to ensure primary key or vector is unique?

xiaofan-luan commented 3 months ago

why not consider read before write to ensure primary key or vector is unique?

That is exactaly we are trying to deliver. The overgoal is to make retrieve on PK and retrieve on vector fast. We already have a PK index. BF is probably gonna to improve to prune the candidate segment

xuwenqiang224 commented 1 month ago

why not consider read before write to ensure primary key or vector is unique?

That is exactaly we are trying to deliver. The overgoal is to make retrieve on PK and retrieve on vector fast. We already have a PK index. BF is probably gonna to improve to prune the candidate segment

when it will be done, and what function we can call to drop the duplicate content?

xiaofan-luan commented 2 weeks ago

/assign @czs007