Colbert for handling multilingual passages and queries

Hello there,

First of all, thanks for the amazing work you guys have put out here. I have been playing around with ColBert for a while and came across a problem.

I'm building a reranking pipeline where passages (max 15) can be in multiple languages and input query as well. I'm already using open-ai embeddings to do round one of retrieval and would like to use colbert for reranking the results.

Is there a recommended way to use ColBert for this purpose? Shall i use a multilingual model for encoding or shall I fine-tune existing checkpoint on my domain-specific data (have 10,000 instances with 1:10 ratio of positive to negative passages per query)?

Currently focusing on Japanese, English, Chinese, Korean and German languages.

Thanks, any advice is much appreciated.

stanford-futuredata / ColBERT

Colbert for handling multilingual passages and queries #252