Closed jchauhan closed 6 months ago
Would recommend you to check this #26724 and try the solution, might be that or if the saving does not work, concurrency there. Code was recently changed cc @muellerzr 🤗
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
System Info
Env
Libraries installed
Command
Who can help?
text models: @ArthurZucker and @younesbelkada trainer: @muellerzr and @pacman100
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Expected behavior
The trained model and checkpoint should be complete within a reasonable time of 15 mins. The training takes 5 mins however, checkpointing and saving model does not complete