Open Mr-lonely0 opened 1 year ago
I guess you could use LoRA, gradient checkpoint, lower batch size & gradient accumulation, off-load reference model to CPU RAM, decrease max_length.
I guess you could use LoRA, gradient checkpoint, lower batch size & gradient accumulation, off-load reference model to CPU RAM, decrease max_length.
In fact, I ran out of memory, not GPU memory. So is there any other solutions to solve this problem?
@Mr-lonely0 I tried training actor model and critic model with LLaMa model. The training in step3 crashed when torch.load(critic_model) which i guess is out of memory problem. Have you solve the problem you mentioned?
My actor model and critic model are both 7b in size. When I run the step3, there will be memory overflow, which consumes about 225G of memory space. Is there any solution to reduce memory consumption without changing the size of the model?