Open cramraj8 opened 4 months ago
Hi @vjeronymo2 @lhbonifacio maybe do you have a hint of what is going on here?
Hi @cramraj8 From the languages in your results I guess you are using Mr Tydi, right? I would say that the gap from 580M (mT5-base) to 13B and the multi-language are the main issues here. As a hint, we have observed similar results when trying to finetune mT5 models for 10k steps (as this number has generated better results for the monoT5-english version). However, finetuning for 10k steps in a multi-language scenario was just not enough for the model to learn the reranking task. That is the reason you cannot find any multi-language model finetuned for just 10k in our huggingface hub. You are going up in the number of parameters scale, but not following it in the training data scale, so I would say that's the reason here.
Hi @lhbonifacio , yes I am evaluating on Mr TyDi. I am a bit of confused here.
If I interpret your reply correctly, monoT5-english
version shows optimum performance with only 10k training. But mT5-base
does not show optimum at 10k, so you had to train till 100k to show improvements. However, mT5-13B trained with 100k is not yet optimum because we should train on even larger training data because the model size now has increased from base to 13B. Is that accurate ?
In summary, in the context of multilingual re-ranking when the model size increases (580M --> 13B) we should increase the training iterations or training sample size too ?
I tried to evaluate both unicamp-dl/mt5-base-en-msmarco and unicamp-dl/mt5-13b-mmarco-100k, but the performance of 13b is lower than base model. Here is a simple comparison of reranking results of BM25 top-100 results measured in nDCG@10. Did you observe similar trend, or there can be any underling reasons ? @rodrigonogueira4