Closed iwiwi closed 1 year ago
Rumor says that we can train Llama-2 70B with this option even if we are using 40GB GPUs.
An interesting finding was that model.device becomes "cpu" when using CPU offloading. We needed some fix for that behavior.
model.device
"cpu"
1
Rumor says that we can train Llama-2 70B with this option even if we are using 40GB GPUs.