Model performance suprisingly bad

ruihan0495 commented 1 year ago

Dear all,

We are trying to reproduce the results, however, as we follow the training steps, our chatbot is keep repeating a nonsense. We suspect that our RLHF part is bad, so we simply load the pretrained model, and the result is also very very bad. Anyone has the same issue? If you successfully trained a decent chatbot, do you have any bitter lesson that could share across the community? Thanks!

Kind regards, Jade

PtttCode commented 1 year ago

I have the same situation. I had trained a 13b model with the training steps in docs. but it cant response as well as the doc described. For example: it cant output something useful 2023-04-17_15-07

Also, it always output repeat words, I think it wonder padding length? 2023-04-17_15-10

s-isaev commented 1 year ago

The same problem. This is my conversation with 13b model:

Human: Hi, can I ask you a question? Assistant: Sure, what's up?

Human: Do you know Microsoft? Assistant: Yes, I do. What's up?

Human: Can you explain it a 6-year old child? I wonder how I should describe it. Assistant: Sure, I can try. Microsoft is a company that makes software and hardware. They make software that helps people do things like write emails, make phone calls, and use the internet. They also make hardware that helps people do things like use computers, and use software. They sell their software and hardware to people all over the world. They also make software and hardware that helps people do things like make video games, and use the internet. They sell their software and hardware to people all over the world. They also make software and hardware that helps people do things like make video games, and use the interne

The problem may be that it is not configured inference parameters in chatbot.py. Beam search, repetition penalty or something else...

ruihan0495 commented 1 year ago

Makes sense :D @s-isaev Do you have any suggestions for inference parameters?

AdamG012 commented 1 year ago

Hey there, although this does not directly solve the issue, I have trained the 1.3B model from scratch and the performance was quite decent, particularly with the EMA actor model. I'd assume the 13B model is significantly more complex to train but I have uploaded the 1.3b model to HuggingFace if you would like to test it out.

AdamG012 commented 1 year ago

Here is an example conversational output log from the EMA model:

s-isaev commented 1 year ago

@AdamG012 Your results are really impressive! Could you please share configs you used to training? And what GPUs and how much of them you did you use to train your model?

AdamG012 commented 1 year ago

Hey @s-isaev no problem, this was trained on the base configurations for the 1.3B model provided on the Github. These are:

training/step1_supervised_finetuning/training_scripts/single_node/run_1.3b.sh
training/step2_reward_model_finetuning/training_scripts/single_node/run_350m.sh
training/step3_rlhf_finetuning/training_scripts/single_node/run_1.3b.sh

The hardware configuration consisted of a single node of 8 A100s. You can find the models and training logs from all the steps on my HuggingFace.

I hope this helps!

feiliya333 commented 1 year ago

Hey @s-isaev no problem, this was trained on the base configurations for the 1.3B model provided on the Github. These are:

training/step1_supervised_finetuning/training_scripts/single_node/run_1.3b.sh

training/step2_reward_model_finetuning/training_scripts/single_node/run_350m.sh

training/step3_rlhf_finetuning/training_scripts/single_node/run_1.3b.sh

The hardware configuration consisted of a single node of 8 A100s. You can find the models and training logs from all the steps on my HuggingFace.

I hope this helps!

could you share the commit IDs of DeepSpeed and DeepSpeedExamples? Thank you so much!

AdamG012 commented 1 year ago

Hey there @feiliya333, for this model my commit ID for DeepSpeedExamples for that run was: a5d4dc12828514ea7f427cd513665ba711dcd670.

EDIT: It seems I updated the repo on that machine in the meantime and my head didn't match the one I ran on. Looking back I think the commit I used was: 3a77097a7b6ce67b78ea7c05129b90915cfebbb0 so hopefully this works.

My DeepSpeed version was also 0.9.0.

Hope this helps

microsoft / DeepSpeedExamples

Model performance suprisingly bad #318