Is your feature request related to a problem? Please describe.
Colbert is known for its high search quality due to late interaction (And of course faster than cross encoders)
The problem of colbert is it takes too much memories and computation so it's not really good for production environment.
Describe the solution you'd like.
Support embedding arrays in milvus
in the embedding arrays, using binary embeddings to reduce the memory foot print (For 512 token chunk, this array is 512 * 1536 ~= 700K)
when retrieve, simply calculate the distance between query token embeddings (q1,q2,....qn) and their topK nearest names., get N*K candidate documents, deduplicate
for those document, search for MAXSIM(query, corpus) again, use as the final result
We need to better support binary embeddings before we support colbert. But let's keep it to be discussed.
Is there an existing issue for this?
Is your feature request related to a problem? Please describe.
Colbert is known for its high search quality due to late interaction (And of course faster than cross encoders)
The problem of colbert is it takes too much memories and computation so it's not really good for production environment.
Describe the solution you'd like.
We need to better support binary embeddings before we support colbert. But let's keep it to be discussed.
Describe an alternate solution.
No response
Anything else? (Additional Context)
No response