Cannot apply "run_dpo.py" on a trained Axolotl model

huggingface / alignment-handbook

Robust recipes to align language models with human and AI preferences

https://huggingface.co/HuggingFaceH4

Apache License 2.0

4.2k stars 357 forks source link

Cannot apply "run_dpo.py" on a trained Axolotl model #105

Open MatanVetzler opened 5 months ago

MatanVetzler commented 5 months ago

After using Axolotl to SFT my mistral7b model I tried to align it using DPO At some point in the code (in the DPOTrainer initialization) the code freezes and stops after timeout is reached. When trying to run the script on the base model (https://huggingface.co/TokenBender/pic_7B_mistral_Full_v0.2) it works well. Attaching a screenshot of the part where it freezes.