Open yangtian6781 opened 1 month ago
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
System Info
Information
Tasks
no_trainer
script in theexamples
folder of thetransformers
repo (such asrun_no_trainer_glue.py
)Reproduction
Expected behavior
in my code, './data.bin' file contains simple number from 1 to 150, i set deepspeed stage 3 and zero3_save_16bit_model=True,i only want to save model's state_dict. an error occurs although this code successfully save model's state_dict into pytorch_model.bin file:
it may means accelerate is not completely compatible with deepspeed zero3?