milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
29.77k stars 2.86k forks source link

[Feature]: Support colbert embeddings #31920

Open xiaofan-luan opened 6 months ago

xiaofan-luan commented 6 months ago

Is there an existing issue for this?

Is your feature request related to a problem? Please describe.

image

Colbert is known for its high search quality due to late interaction (And of course faster than cross encoders)

The problem of colbert is it takes too much memories and computation so it's not really good for production environment.

Describe the solution you'd like.

  1. Support embedding arrays in milvus
  2. in the embedding arrays, using binary embeddings to reduce the memory foot print (For 512 token chunk, this array is 512 * 1536 ~= 700K)
  3. when retrieve, simply calculate the distance between query token embeddings (q1,q2,....qn) and their topK nearest names., get N*K candidate documents, deduplicate
  4. for those document, search for MAXSIM(query, corpus) again, use as the final result

We need to better support binary embeddings before we support colbert. But let's keep it to be discussed.

Describe an alternate solution.

No response

Anything else? (Additional Context)

No response

yiwen92 commented 5 months ago

Is there any dependency between binary embeddings and colbert? Or is just due to bandwidth limitation

xiaofan-luan commented 5 months ago

no there isn't. However, store colbert into binary format can reduce the cost