Closed alvarobartt closed 8 months ago
Thanks for pointing this out, we are looking into it.
Ok we have had a good look. We use RMSprop for DPO, it appears that the paper and the model cards for the zephyr models are incorrect. We will correct them as soon as we can.
Thanks for pointing out this inconsistency.
This issue has not been solved since the optim for the config is still adamw_torch.
: https://github.com/huggingface/alignment-handbook/blob/main/recipes/zephyr-7b-beta/dpo/config_full.yaml#L32
Hi to whoever is reading this 🤗
Question
After reading the Zephyr pre-printed paper https://arxiv.org/pdf/2310.16944.pdf and going through the configuration files here, I saw that there was a mismatch between the optimizer used in https://github.com/huggingface/alignment-handbook/blob/main/recipes/zephyr-7b-beta/dpo/config_full.yaml, and the one reported in the paper, AdamW.
So the question is, did you use RMSprop to run the full DPO fine-tuning or AdamW with no weight decay as stated in the paper?
Thanks in advance!