Reproduction of results

mathamateur commented 3 weeks ago

Dear authors! I have tried to reproduce your results on the dolly dataset with Qwen1.5 as a teacher and gpt-2 as a student. Unfortunately, my results are differ from yours. dolly_exp

Could you clarify:

What precision do you use to obtain your results? Unfortunately, I don't have a device which support bf16, so I run my experiments in fp16. As you can see, in case of DSKD I obtain infinite loss, probably due to limited precision. I tried to run my experiments in fp32, but results differ as well. For dskd-cma setup I obtain the mean value of Rouge-L 22.32.
Do you use the teacher model finetuned as well on the same dataset? The config implies to use path to the finetuned teacher, so I use it in my experiments.

All other settings for each experiments were maintained the same as in provided scripts.

I'll be very glad to get you replies.

songmzhang commented 3 weeks ago

Hi,

We use bf16 to train and evaluate all our models. Could you provide your training log of DSKD and DSKD-CMA? So that I can better figure out what happened during your training.
Yes, the teacher model is firstly trained on the same training set and then distills its knowledge to the student model.

mathamateur commented 1 week ago

Hello! I have prepared a log of training for SFT of GPT-2, since I have a different results of SFT as well. Please, have a look. Also I have checked my environment and noticed that I have lower versions of deepspeed, torch and transformers then you recommend. Could it be a problem? train.log requirements.txt results

songmzhang commented 1 week ago

Hi, I've checked your log and I think this is because you didn't load the correct deepspeed config file for fp32, i.e., ./configs/deepspeed/ds_config_fp32.json. However, it was our mistake to miss this config in the training scripts. Now the new scripts have been updated in the repo and you can retry them and see whether they work as you expected.

mathamateur commented 1 week ago

Thanks for your clarification! However, I noticed this problem in training script by myself previously and fixed it the same way as you. You can check, that in my train.log file deepspeed config is correct. So, I believe, that I have actually run experiments in fp32. I guess, the problem is somewhere else...

songmzhang commented 1 week ago

Thanks for your clarification! However, I noticed this problem in training script by myself previously and fixed it the same way as you. You can check, that in my train.log file deepspeed config is correct. So, I believe, that I have actually run experiments in fp32. I guess, the problem is somewhere else...

Hi, in your provided log, the printed Argument has deepspeed_config=None, which means that the corresponding ds config for fp32 was not loaded successfully. Moreover, loss scaler will only appear under fp16, so I think the model was trained in fp16. So I suggest to check your scripts again and ensure that the correct config has been loaded (maybe you can print the model_dtype before training).

mathamateur commented 1 week ago

I have tried to run gpt2 sft in fp16 with your new script and noticed, that deepspeed_config=None in this case as well :((( gpt2_sft_fp16_train.log

songmzhang commented 1 week ago

I have tried to run gpt2 sft in fp16 with your new script and noticed, that deepspeed_config=None in this case as well :((( gpt2_sft_fp16_train.log

Can you pull our latest code and see again the deepspeed_config in Arguments? Or may I have a look at your training script for fp32?

mathamateur commented 1 week ago

Hello! I have pulled the latest version of your repo and tried to run sft of gpt-2 in fp32 again. Indeed, this time deepspeed_config has logged as ds_config_fp32.json. However, the loss scaler has been activated for this configuration, so I am afraid, that model is still trained in fp16. Here is my train log sft_gpt2_fp32_new_train.log

songmzhang commented 1 week ago

Hello! I have pulled the latest version of your repo and tried to run sft of gpt-2 in fp32 again. Indeed, this time deepspeed_config has logged as ds_config_fp32.json. However, the loss scaler has been activated for this configuration, so I am afraid, that model is still trained in fp16. Here is my train log sft_gpt2_fp32_new_train.log

Hi, I think the problem comes from --model_dtype. It is set to fp16 by default. You can try to pass this parameter in your training script like --model_dtype fp32.

songmzhang / DSKD

Reproduction of results #15