Open kazuar opened 3 months ago
First, sorry I didn't know I had no issue tab. I've opend the issue tab now.
Does your error occured while full-finetuning? I'm not getting the error for now, however it might be the issue caused by accelerator.
Can you add
from accelerate import Accelerator
a = Accelerator()
a.save_model(trainer.model, output_dir)
trainer.model.config.save_pretrained(output_dir)
in safe_save_model_for_hf_trainer instead trainer.save_model(output_dir)
.
I'll get on working further with this issue.
Does your error occured while full-finetuning?
Yes, I'm running the finetune.sh script with my own data (just changed one of the parameters num_train_epochs
to 4)
I'm not getting the error for now, however it might be the issue caused by accelerator.
I'll restart the environment tomorrow and try again. Seems like a good idea!
@2U1 thanks for all the help!
Also, another way is to comment the deepspeed part in the safe_save_model_for_hf_trainer
function like this.
I couldn't run the full-finetuning for the gpu issue, so I couldn't exactly solve the issue for right now. Let me know if one the following way solve this issue.
# if trainer.deepspeed:
# from accelerate import Accelerator
# accelerator = Accelerator()
# accelerator.wait_for_everyone()
# torch.cuda.synchronize()
# # trainer.save_model(output_dir)
# accelerator.save(trainer.model, output_dir, max_shard_size = '5GB')
# trainer.model.config.save_pretrained(output_dir)
# trainer.processor.save_pretrained(output_dir)
# return
state_dict = trainer.model.state_dict()
if trainer.args.should_save:
cpu_state_dict = {
key: value.cpu()
for key, value in state_dict.items()
}
del state_dict
trainer._save(output_dir, state_dict=cpu_state_dict) # noqa
trainer.model.config.save_pretrained(output_dir)
When running the
finetune.sh
script with my own dataset, I encountered the following error during checkpoint / saving the model:Setting
safe_serialization=False
resulted in a model that wasn't able to load. @2U1 did you encounter this error? (opened it here because https://github.com/2U1/Phi3-Vision-ft doesn't have an issues tab)