Closed rkinas closed 8 months ago
Hello Remek, I appreciate your interest in ORPO!
Can you share the error log while saving the model?
We've noticed that the model is not correctly saved with FSDP even when the model is saved in the main branch, so we are working on fixing the issue in FSDP.
Hello,
we have managed the conflict between torch.compile
in TrainingArguments
and FSDP model saving.
Although we are not sure about the actual error on your side, the problem on our side was the model being saved with _orig_mod.
prepended to the weight map:
{
"metadata": {
"total_size": 5559367680
},
"weight_map": {
"_orig_mod.lm_head.bias": "model-00002-of-00002.safetensors",
"_orig_mod.lm_head.weight": "model-00002-of-00002.safetensors",
"_orig_mod.model.embed_tokens.weight": "model-00001-of-00002.safetensors",
"_orig_mod.model.final_layernorm.bias": "model-00002-of-00002.safetensors",
"_orig_mod.model.final_layernorm.weight": "model-00002-of-00002.safetensors",
From our settings (A6000 cluster & A100 cluster), we checked that the model is correctly saved and loaded with AutoModelForCausalLM.from_pretrained
with the latest commit.
Could you check if this fix resolves your issue by trying the latest version?
Hi, thank you for answer. It turned out that removing the generation_config.json file from the model directory (I had to download the OpenChat3.5 model locally) solved the problem. Now the model saves correctly.
ValueError: The generation config instance is invalid -- `.validate()` throws warnings and/or exceptions. Fix these issues to save the configuration.
BTW: I used ORPO on OpenChat3.5-0106 which is SFT and probably after DPO. ORPO scored on MT-Bench higher (first turn is much higher).
Thank you for sharing the nice result! Did you train the model with the code above?
Yes, 100% the same code - only change in config files and saved tokenizer at final stage.
Glad to see that ORPO is also giving promising results on fine-tuned chat models too😀 Closing the issue as the model saving issue is resolved, thank you again for sharing the nice result!
Hi, thank you for providing ORPO. I ran quick training but after finishing it did not saved model - it crashed.
My setup:
I took oryginal fsdp.yaml configuration (no changes).
Best regards Remek