Hey @wyhsirius,
I was training the model on 4gpus, Have you met the following problem:
1) When I directly train start from 0,
I can use batch_size=32 to train the model without any problem,
2) However, when I want to train the model with --resume_ckpt, it shows like below, and I can just use very small batch size to avoid the out of memory problem :
I would appreciate it if you can share me some suggestion to solve this problem~
Hey @wyhsirius, I was training the model on 4gpus, Have you met the following problem:
1) When I directly train start from 0, I can use batch_size=32 to train the model without any problem,
2) However, when I want to train the model with
--resume_ckpt
, it shows like below, and I can just use very small batch size to avoid theout of memory
problem :I would appreciate it if you can share me some suggestion to solve this problem~
Bests,