Training leads to OOM even with an 80GB GPU card. Would you please give some advices ?
***** Running training *****
Num examples = 1799
Num Epochs = 1
Instantaneous batch size per device = 1
Total train batch size (w. parallel, distributed & accumulation) = 1
Gradient Accumulation steps = 1
Total optimization steps = 1799
Number of trainable parameters = 6738423808
0%| | 0/1799 [00:00<?, ?it/s]Traceback (most recent call last):
File "training_example.py", line 45, in <module>
Trainer(model = model,
File "/data/anaconda3/envs/llama/lib/python3.8/site-packages/transformers/trainer.py", line 1543, in train
return inner_training_loop(
File "/data/anaconda3/envs/llama/lib/python3.8/site-packages/transformers/trainer.py", line 1858, in _inner_training_loop
self.optimizer.step()
File "/data/anaconda3/envs/llama/lib/python3.8/site-packages/torch/optim/lr_scheduler.py", line 65, in wrapper
return wrapped(*args, **kwargs)
File "/data/anaconda3/envs/llama/lib/python3.8/site-packages/torch/optim/optimizer.py", line 113, in wrapper
return func(*args, **kwargs)
File "/data/anaconda3/envs/llama/lib/python3.8/site-packages/transformers/optimization.py", line 362, in step
denom = exp_avg_sq.sqrt().add_(group["eps"])
RuntimeError: CUDA out of memory. Tried to allocate 172.00 MiB (GPU 0; 79.18 GiB total capacity; 76.21 GiB already allocated; 162.38 MiB free; 77.88 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
0%| | 0/1799 [00:01<?, ?it/s]
Training leads to OOM even with an 80GB GPU card. Would you please give some advices ?