First of all, thanks for the amazing work you guys have put out here.
I have been playing around with ColBert for a while and came across a problem.
I'm building a reranking pipeline where passages (max 15) can be in multiple languages and input query as well.
I'm already using open-ai embeddings to do round one of retrieval and would like to use colbert for reranking the results.
Is there a recommended way to use ColBert for this purpose?
Shall i use a multilingual model for encoding or shall I fine-tune existing checkpoint on my domain-specific data (have 10,000 instances with 1:10 ratio of positive to negative passages per query)?
Currently focusing on Japanese, English, Chinese, Korean and German languages.
Hello there,
First of all, thanks for the amazing work you guys have put out here. I have been playing around with ColBert for a while and came across a problem.
I'm building a reranking pipeline where passages (max 15) can be in multiple languages and input query as well. I'm already using open-ai embeddings to do round one of retrieval and would like to use colbert for reranking the results.
Is there a recommended way to use ColBert for this purpose? Shall i use a multilingual model for encoding or shall I fine-tune existing checkpoint on my domain-specific data (have 10,000 instances with 1:10 ratio of positive to negative passages per query)?
Currently focusing on Japanese, English, Chinese, Korean and German languages.
Thanks, any advice is much appreciated.