Open Feynman27 opened 9 months ago
I think this has something to do with the tokenizer. I trained an SFT model, and am providing the local path to that model for DPO. If I use the default path from the hub alignment-handbook/zephyr-7b-sft-full
, I don't get the error and DPO training starts fine.
It appears the tokenizer_config.json
written to the output model directory during the SFT stage needs to be replaced if loading the SFT model from that same local directory for the DPO phase. I switched out the tokenizer_config
from the SFT phase to the one from the model card, and DPO training works now. It looks like all other configs are the same between SFT and DPO (e.g. tokenizer.json
).
This was not obvious at all. Can we add a note to the README or make this more fool-proof?
When running the DPO script, when calling
I'm getting the error: