texttron / tevatron

Tevatron - A flexible toolkit for neural retrieval research and development.
http://tevatron.ai
Apache License 2.0
435 stars 87 forks source link

Training log of RepLLaMA #104

Closed kyriemao closed 5 months ago

kyriemao commented 5 months ago

Hi Xueguang,

Great work! I am training my own RepLLaMA now and find that the training loss starts from 90+ and quickly drops below 0.1 in around 30 steps (as shown below). Is it normal or could please provide your training log of RepLLaMA?

image

Thanks!

MXueguang commented 5 months ago

this looks a bit weird. what is your batch size/ training group size setting?

kyriemao commented 5 months ago

this looks a bit weird. what is your batch size/ training group size setting?

The param settings are:

I use 6 A100 40G GPUs for training.

kyriemao commented 5 months ago

Solved. It is because of my own bug about processing the EOS token. Thanks!

guankaisi commented 5 months ago

Hello, I met the same problem. Can you please tell me how do you solve it? Thank you a lot!