Closed dorost1234 closed 3 years ago
Similar issue to https://github.com/huggingface/transformers/issues/11294
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Environment info
transformers
version: 4.4.5Who can help
@sgugger @patrickvonplaten, @patil-suraj
Information
Hi I am training t5-base model on mnli dataset, with batch size = 128, the training works fine, but the moment, I want to resume from a checkpoint, then I will get a memory issue, so I observe large memory usage when it is resuming the training.
Expected behavior
resuming from a checkpoint and training, should take equal amount of memory
Error Stack
Thanks for your help and suggestions.