Training log of RepLLaMA

texttron / tevatron

Tevatron - A flexible toolkit for neural retrieval research and development.

http://tevatron.ai

Apache License 2.0

435 stars 87 forks source link

Training log of RepLLaMA #104

Closed kyriemao closed 5 months ago

kyriemao commented 5 months ago

Hi Xueguang,

Great work! I am training my own RepLLaMA now and find that the training loss starts from 90+ and quickly drops below 0.1 in around 30 steps (as shown below). Is it normal or could please provide your training log of RepLLaMA?

Thanks!

MXueguang commented 5 months ago

this looks a bit weird. what is your batch size/ training group size setting?

kyriemao commented 5 months ago

this looks a bit weird. what is your batch size/ training group size setting?

The param settings are:

per_gpu_train_batch_size=8,
hard_negatives_per_sample=15,
learning_rate=1e-4,
gradient_accumulation_steps=4.

I use 6 A100 40G GPUs for training.

kyriemao commented 5 months ago

Solved. It is because of my own bug about processing the EOS token. Thanks!

guankaisi commented 5 months ago

Hello, I met the same problem. Can you please tell me how do you solve it? Thank you a lot!