Open 302746420 opened 7 months ago
We had some research on ['Colbert'], it seems need much more memory and a special storage support, so we haven't make a decision for whether support it or not, do you have some solid backgrounds for why you guys are using BGE-M3['Colbert']?
We had some research on ['Colbert'], it seems need much more memory and a special storage support, so we haven't make a decision for whether support it or not, do you have some solid backgrounds for why you guys are using BGE-M3['Colbert']?
We conduct a RAG service for our users. In our experiment ,we found that combine the ['colbert'] mode with ['sparse'] and ['dense'] can make great progress on metric of recall. So we want to know is there any way to storge the BGE-M3['Colbert'].
quick question. Is it colbert for retrieval or colbert for ranking? I thought this is a interesting design decision, from our perspective we didn't see too much improvement on search quality with colbert and colbert takes large amount of storage. So we might be interested what kind of experiment you guyes have did.
From free to connect me at xiaofan.luan@zilliz.com and willing to have a quick talk with you guyes
answer is for retrieval. we use bge-m3[dense] for coarse recall and bge-m3[dense、sparse、colbert] for fine recall. The hit rate has increased by 2% compare to only use dense and sparse.
answer is for retrieval. we use bge-m3[dense] for coarse recall and bge-m3[dense、sparse、colbert] for fine recall. The hit rate has increased by 2% compare to only use dense and sparse.
what ann library/system are u currently using?
vespa? store all the tokens embedding might consume a lot of memories
answer is for retrieval. we use bge-m3[dense] for coarse recall and bge-m3[dense、sparse、colbert] for fine recall. The hit rate has increased by 2% compare to only use dense and sparse.
what ann library/system are u currently using?
FLAT,we do not use ANN. But we test milvus's HNSW.it's great.
vespa? store all the tokens embedding might consume a lot of memories
memories really is a problem that we should consider.We will test it and balance the cost and efficiency.
Thakns for your relpy.
quick question. does colbert still take up a lot of memory during the retrival term. could we do embedding at first and retrival later?
quick question. does colbert still take up a lot of memory during the retrival term. could we do embedding at first and retrival later?
Colbert is token level embedding. which means for each token you will need one embedding. That's why it takes a lot of memory
quick question. does colbert still take up a lot of memory during the retrival term. could we do embedding at first and retrival later?
Colbert is token level embedding. which means for each token you will need one embedding. That's why it takes a lot of memory
We do not have a vast number of dataset, so we want to use BGE-M3 hybrid retrival methods with [sparse, dense, colbert] vectors. if Colbert does not take too much time and memory during retrival, we think we could use it for our RAG.
quick question. does colbert still take up a lot of memory during the retrival term. could we do embedding at first and retrival later?
Colbert is token level embedding. which means for each token you will need one embedding. That's why it takes a lot of memory
Doesn't PLAID address this problem?
PLAID
what is PLAID? more details?
Sorry, this is the paper: https://arxiv.org/abs/2205.09707 I think there is an implementation here too: https://github.com/bclavie/RAGatouille
Basically, doesn't store every single embedding naively, but creates k centroids and then store just the quantized residuals respect to these centroids (my understanding, I am not a researcher etc).
Also someone was able to index 500 millions documents recently, with an extended approach in a streaming fashion. So not all documents known beforehand, but indexed when they come...as you may guess, early centroids selection can influence performance, but they managed to solve this problem (again my understanding...):
upon our test, colbert is slightly better compared to dense embeddings, correct me if there is a dataset that colbert out perform dense embedding a lot. that's my we are doubting this is really helpful. store a quantinized version do help to reduce the cost the storage. but it still dramatically larger than dense embedding size?
I have never compared the size of a dense solution vs colbert. However, I have found a presentation (https://web.stanford.edu/class/cs224v/lectures/ColBERT-Stanford-224V-talk-Nov2023.pdf) where they go from 150gb to 25gb or less, depending on the compression chosen. This is about dataset "MS MARCO Passage Ranking". Same dataset is said to be around 26gb with HNSW: https://arxiv.org/html/2312.01556v1 (if I am reading this correctly).
I agreed. I also read that with heavy quantizations colbert can reach similar performance as dense embeddings.
@wxywb has some perf result that can be shared. If anyone did a succesful POC on colbert embeddings please share your result
I have tested the original colbert-v2. It shows that when it is used for reranking, it is relatively faster compared with the cross-encoder, and has a slight improvement compared with embedding model. But it is indeed more expensive in vector storage and distance calculation.
@xiaofan-luan what about this? https://superlinked.com/vectorhub/articles/evaluation-rag-retrieval-chunking-methods
They claim Colbert is better on all datasets they tested by a 10% margin. This is consistent with my tests, but have nothing particular robust to share at the moment.
Also, as a further benefit, consider Colbert is less black box than other solutions and maybe, from a UX perspective, even offer an highlighting feature (like we used to have with lexical solutions).
whats here the actual status any updates or showcases for both retrieval and re-ranking?
colbert and reranking is on our plan for 3.0
@liliu-z is actaully working on colbert support and integration. Yes colbert is usually good at accuracy but on the other side it takes more memory and generally means cost.
We are still working on ways to reduce cost of colbert
/assign
/reopen
@liliu-z: Reopened this issue.
Hi Guys,
that's a highly desired feature. Do you hav any predictions on the release date? ...let me volunteer and test it for you using real production data. If you could share a pre-release? I would test it for you.
colbert and reranking is on our plan for 3.0
Would it be possible to obtain a dev branch for testing?
Hi Guys,
that's a highly desired feature. Do you hav any predictions on the release date? ...let me volunteer and test it for you using real production data. If you could share a pre-release? I would test it for you.
Can I ask how you are using Colbert for now and how good it is in your scenarios. This can help us listen the sounds from the community better and reprioritize the work.
We are still under investigation stage for now, since merging it with current API is a big challenge. However, we can do something outside the Milvus (from the client side) to mimic the performance of Colbert. Will share some result once we get any and more than welcome for any input about Colbert.
Hey @liliu-z !
Can I ask how you are using Colbert for now and how good it is in your scenarios. This can help us listen the sounds from the community better and reprioritize the work.
The biggest feature of Colbert (in my opinion) is that the tokens are context-aware. Now, with the JinAI model, we can late-chunk, which somewhat solves the cross-chunk reference problem (e.g., Chunk 1: Joe is a pilot, Chunk 2: He is the best. -> we will never know that Joe is the best pilot.) Colbert would be my choice of embedding method in a case where accuracy is more important speed (e.g.: any scientific application).
Hi Guys, that's a highly desired feature. Do you hav any predictions on the release date? ...let me volunteer and test it for you using real production data. If you could share a pre-release? I would test it for you.
Can I ask how you are using Colbert for now and how good it is in your scenarios. This can help us listen the sounds from the community better and reprioritize the work.
We are still under investigation stage for now, since merging it with current API is a big challenge. However, we can do something outside the Milvus (from the client side) to mimic the performance of Colbert. Will share some result once we get any and more than welcome for any input about Colbert.
I am a colleague with [sskserk] We use Colpali(vision analog of Colbert) in vision RAG. We want to skip table and image extraction and make RAG, which we can use to retrieve images and send them to vLLM. This approach significantly improved metrics compared to classic text RAG. We can use Milvus with external functions... but this approach is not very effective.. so it will be great to use it inside milvus.
Hi Guys, that's a highly desired feature. Do you hav any predictions on the release date? ...let me volunteer and test it for you using real production data. If you could share a pre-release? I would test it for you.
Can I ask how you are using Colbert for now and how good it is in your scenarios. This can help us listen the sounds from the community better and reprioritize the work. We are still under investigation stage for now, since merging it with current API is a big challenge. However, we can do something outside the Milvus (from the client side) to mimic the performance of Colbert. Will share some result once we get any and more than welcome for any input about Colbert.
I am a colleague with [sskserk] We use Colpali(vision analog of Colbert) in vision RAG. We want to skip table and image extraction and make RAG, which we can use to retrieve images and send them to vLLM. This approach significantly improved metrics compared to classic text RAG. We can use Milvus with external functions... but this approach is not very effective.. so it will be great to use it inside milvus.
@liliu-z here is what we are chasing for
Hi Guys, that's a highly desired feature. Do you hav any predictions on the release date? ...let me volunteer and test it for you using real production data. If you could share a pre-release? I would test it for you.
Can I ask how you are using Colbert for now and how good it is in your scenarios. This can help us listen the sounds from the community better and reprioritize the work. We are still under investigation stage for now, since merging it with current API is a big challenge. However, we can do something outside the Milvus (from the client side) to mimic the performance of Colbert. Will share some result once we get any and more than welcome for any input about Colbert.
I am a colleague with [sskserk] We use Colpali(vision analog of Colbert) in vision RAG. We want to skip table and image extraction and make RAG, which we can use to retrieve images and send them to vLLM. This approach significantly improved metrics compared to classic text RAG. We can use Milvus with external functions... but this approach is not very effective.. so it will be great to use it inside milvus.
@liliu-z here is what we are chasing for
Hi @sskserk @Despiko @gabor-one
Thanks for these infos. We have investigated on this for time and put it in our roadmap. Updates will be keeping shared. Stay tuned!
@liliu-z & @xiaofan-luan,
..as you are on the way, @Despiko and myself are ready to test the dev status of the feature. Rely on us please ... a big company 50K+ employees is behind us ;-)
thank you for the confirmation!
Hi @sskserk @Despiko @gabor-one, thank you so much for the interest in Milvus! While we are working on supporting ColBERT/ColPali natively, we have just published a reference implementation for doing ColBERT-style retrieval on the client side with Milvus: https://milvus.io/docs/use_ColPali_with_milvus.md
Basically it stores token/sequence vectors as individual rows in Milvus
self.client.insert(
self.collection_name,
[
{
"vector": colbert_vecs[i],
"seq_id": seq_ids[i],
"doc_id": doc_ids[i],
"doc": docs[i],
}
for i in range(seq_length)
],
)
and does heuristic based search of individual vectors to find potentially related docs and fetch all vectors of those docs for calculating maxsim to re-rank them.
def rerank_single_doc(doc_id, data, client, collection_name):
# Rerank a single document by retrieving its embeddings and calculating the similarity with the query.
doc_colbert_vecs = client.query(
collection_name=collection_name,
filter=f"doc_id in [{doc_id}, {doc_id + 1}]",
output_fields=["seq_id", "vector", "doc"],
limit=1000,
)
doc_vecs = np.vstack(
[doc_colbert_vecs[i]["vector"] for i in range(len(doc_colbert_vecs))]
)
score = np.dot(data, doc_vecs.T).max(1).sum()
return (score, doc_id)
Of course the performance won't be as good compared to native implementation, but it should work for small scale workload like prototypes or experiments. If you have a large scale workload on production using colbert-style retrieval, we'd love to learn more. we can set up a chat to talk about that.
Is there an existing issue for this?
Is your feature request related to a problem? Please describe.
In 2.4.0 release,BGE-M3 is supported,but only dense and sparse mode. will you suppoert BGE-M3['Colbert'] storage and search? or is there any way that milvus extists to insert martrix like the type of colbert? Thanks!
Describe the solution you'd like.
No response
Describe an alternate solution.
No response
Anything else? (Additional Context)
No response