RuntimeError: CUDA out of memory

Describe the bug

When trying to train the model. I got this error. have searched related issues but cannot get the expected help.

Thank you in advance for any insights you can give.

Reproduction

Command sh seqco_scripts/train_cnndm.sh

Error


(cross_attention): MultiheadAttention(
(k_proj): Linear(in_features=1024, out_features=1024, bias=True)
(v_proj): Linear(in_features=1024, out_features=1024, bias=True)
(q_proj): Linear(in_features=1024, out_features=1024, bias=True)
(out_proj): Linear(in_features=1024, out_features=1024, bias=True)
)
)
| model backsum_transformer_bart_large, criterion LabelSmoothedCrossEntropyCriterion
| num. model params: 630956032 (num. trained: 418883584)
| training on 1 GPUs
| max tokens per GPU = None and max sentences per GPU = 2
| no existing checkpoint found /data/msra/sum_data//cnndm_bart/bart.large/model.pt
| loading train data for epoch 0
| loaded 15000 examples from: data/cnn_dm-bin/train.article-summary.article
| loaded 15000 examples from: data/cnn_dm-bin/train.article-summary.summary
| parallel-data/cnn_dm-bin train 15000 examples
| loaded 15000 examples from: data/cnn_dm-bin/train.article-summary.article
| backtranslate-article: data/cnn_dm-bin train 15000 examples
| WARNING: your device does NOT support faster training with --fp16, please switch to FP32 which is likely to be faster
Traceback (most recent call last):
File "train.py", line 344, in <module>
cli_main()
File "train.py", line 340, in cli_main
main(args)
File "train.py", line 77, in main
extra_state, epoch_itr = checkpoint_utils.load_checkpoint(args, trainer)
File "/home/lieumai/TextSum/SeqCo/fairseq/checkpoint_utils.py", line 143, in load_checkpoint
trainer.lr_step(epoch_itr.epoch)
File "/home/lieumai/TextSum/SeqCo/fairseq/trainer.py", line 600, in lr_step
self.lr_scheduler.step(epoch, val_loss)
File "/home/lieumai/TextSum/SeqCo/fairseq/trainer.py", line 121, in lr_scheduler
self._build_optimizer()  # this will initialize self._lr_scheduler
File "/home/lieumai/TextSum/SeqCo/fairseq/trainer.py", line 143, in _build_optimizer
self._optimizer = optim.FP16Optimizer.build_optimizer(self.args, params)
File "/home/lieumai/TextSum/SeqCo/fairseq/optim/fp16_optimizer.py", line 207, in build_optimizer
fp32_params = cls.build_fp32_params(params)
File "/home/lieumai/TextSum/SeqCo/fairseq/optim/fp16_optimizer.py", line 67, in build_fp32_params
fp32_params = params[0].new(0).float().new(total_param_size)
RuntimeError: CUDA out of memory. Tried to allocate 1.56 GiB (GPU 0; 1.96 GiB total capacity; 1.18 GiB already allocated; 298.94 MiB free; 1.19 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

3. Results of nvidia-smi concerning my GPU

+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 1259 G /usr/lib/xorg/Xorg 4MiB | +-----------------------------------------------------------------------------+


4. Results from torch (version: 1.12.1+cu102):
Is CUDA available? True
How many GPUs? - 1
What is the device name? - NVIDIA GeForce MX230
Memory Usage:
Allocated: 0.0 GB
Cached:    0.0 GB

Python 3.8.10
![Screenshot from 2022-10-16 20-36-58](https://user-images.githubusercontent.com/56626332/196038484-3d0f1904-7c94-459b-94d0-595d425bfeee.png)

xssstory / SeqCo

RuntimeError: CUDA out of memory #4