Can Tevatron be used for Contriever fine-tuning ?

texttron / tevatron

Tevatron - A flexible toolkit for neural retrieval research and development.

http://tevatron.ai

Apache License 2.0

494 stars 94 forks source link

Can Tevatron be used for Contriever fine-tuning ? #99

Closed cramraj8 closed 9 months ago

MXueguang commented 10 months ago

Yes, but need some modification.

https://github.com/texttron/tevatron/blob/2e5d00ee21d5a7db0bd2ea1463c9150a572106d4/src/tevatron/modeling/dense.py#L33 need to change cls token representation to average pooling representation.
when computing loss, need to add a temperature on the score.

cramraj8 commented 10 months ago

@MXueguang Thank you!

cramraj8 commented 3 months ago

@MXueguang does the new branch (main branch) currently incorporates both mean pooling and temperature in the loss?

MXueguang commented 3 months ago

--pooling mean \ --temperature xxx

yes. you can set these two argument during training to match the contriever setup.

cramraj8 commented 3 months ago

Sounds good. It's already working great for me! I found from another thread that temperature is a sensitive parameter during contriever training. I also found with different batch size & temperature values, the performance substantially improves or drops. Any idea why and any optimum values you found? I am experimenting with multilingual datasets, TyDi and MIRACL.

MXueguang commented 3 months ago

@crystina-z can comments more here.

cramraj8 commented 3 months ago

Is there any T5 encoder based Contriever pre-trained/ fine-tuned model available in HuggingFace? I would like to see the performance boost.

crystina-z commented 3 months ago

Hi @cramraj8 we actually didn't actively tune batch size and temperature. In the final config, we use batch=128 and the default temperature. You'll likely find a setting outperforming this, please open a PR if you do!