Open ChenDRAG opened 8 months ago
I found the same issue here
In general, we observe better performance with the full finetune. Although we did not perform a full hyperparameter scan on the lora configs so I am sure improvements can be made there.
As for the misalignment, I am not sure what you are referring to. The dpo-lora config fine-tunes on top of the sft-lora model. Can you provide some more detail?
There is a misalignment between zephyr-7b-dpo-lora and zephyr-7b-dpo-full. The former one is finetuned from mistralai/Mistral-7B-v0.1. The latter is finetuned from zephyr-7b-dpo-full.
I wonder what causes this misalignment ?
Also, have you benchmarked performance improvement of the lora finetunning script? In my experiment, lora finetunning seems do not provide any performance improvement compared with the base model on MT-bench. I think maybe some parameters are incorrect.