huggingface / alignment-handbook

Robust recipes to align language models with human and AI preferences
https://huggingface.co/HuggingFaceH4
Apache License 2.0
4.28k stars 367 forks source link

Did you use RMSprop or AdamW as the optimizer? #43

Closed alvarobartt closed 8 months ago

alvarobartt commented 8 months ago

Hi to whoever is reading this 🤗

Question

After reading the Zephyr pre-printed paper https://arxiv.org/pdf/2310.16944.pdf and going through the configuration files here, I saw that there was a mismatch between the optimizer used in https://github.com/huggingface/alignment-handbook/blob/main/recipes/zephyr-7b-beta/dpo/config_full.yaml, and the one reported in the paper, AdamW.

So the question is, did you use RMSprop to run the full DPO fine-tuning or AdamW with no weight decay as stated in the paper?

Thanks in advance!

edbeeching commented 8 months ago

Thanks for pointing this out, we are looking into it.

edbeeching commented 8 months ago

Ok we have had a good look. We use RMSprop for DPO, it appears that the paper and the model cards for the zephyr models are incorrect. We will correct them as soon as we can.

Thanks for pointing out this inconsistency.

robinsongh381 commented 4 months ago

This issue has not been solved since the optim for the config is still adamw_torch.: https://github.com/huggingface/alignment-handbook/blob/main/recipes/zephyr-7b-beta/dpo/config_full.yaml#L32