sooftware / kospeech

Open-Source Toolkit for End-to-End Korean Automatic Speech Recognition leveraging PyTorch and Hydra.
https://sooftware.github.io/kospeech/
Apache License 2.0
603 stars 191 forks source link

“RuntimeError:CUDA out of memory”, I had set train.batch_size to very small like 4, but the problem still existed. #123

Closed yyMoming closed 3 years ago

yyMoming commented 3 years ago

Traceback (most recent call last): File "/home/yangweiming/MUSIC/Speech_Recognition/Kospeech/bin/main.py", line 175, in main last_model_checkpoint = train(config) File "/home/yangweiming/MUSIC/Speech_Recognition/Kospeech/bin/main.py", line 141, in train resume=config.train.resume, File "/home/yangweiming/MUSIC/Speech_Recognition/Kospeech/kospeech/trainer/supervised_trainer.py", line 166, in train teacher_forcing_ratio=teacher_forcing_ratio, File "/home/yangweiming/MUSIC/Speech_Recognition/Kospeech/kospeech/trainer/supervised_trainer.py", line 269, in _train_epoches loss.backward() File "/home/yangweiming/miniconda2/envs/Kospeech/lib/python3.7/site-packages/torch/tensor.py", line 221, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph) File "/home/yangweiming/miniconda2/envs/Kospeech/lib/python3.7/site-packages/torch/autograd/init.py", line 132, in backward allow_unreachable=True) # allow_unreachable flag RuntimeError: CUDA out of memory. Tried to allocate 88.00 MiB (GPU 0; 7.93 GiB total capacity; 7.09 GiB already allocated; 12.56 MiB free; 7.34 GiB reserved in total by PyTorch)

sooftware commented 3 years ago

What model did you use, and what is your GPU?

kaiqi123 commented 3 years ago

I also have this problem when I run RNN Transducer on LibriSpeech dataset. I set batch_size to 4 and audio_extension is "flac". The following is my GPU (I only use 1 GPU) and cuda version: [2021-05-29 10:07:33,372][kospeech.utils][INFO] - Operating System : Linux 5.4.0-1048-aws [2021-05-29 10:07:33,373][kospeech.utils][INFO] - Processor : x86_64 [2021-05-29 10:07:33,375][kospeech.utils][INFO] - device : Tesla V100-SXM2-16GB [2021-05-29 10:07:33,375][kospeech.utils][INFO] - CUDA is available : True [2021-05-29 10:07:33,375][kospeech.utils][INFO] - CUDA version : 10.2 [2021-05-29 10:07:33,375][kospeech.utils][INFO] - PyTorch version : 1.6.0

The errors are as follows: Traceback (most recent call last): File "./bin/main.py", line 171, in main last_model_checkpoint = train(config) File "./bin/main.py", line 127, in train model = trainer.train( File "/home/ubuntu/rnn-t/new-kospeech/kospeech/trainer/supervised_trainer.py", line 160, in train model, train_loss, train_cer = self._train_epoches( File "/home/ubuntu/rnn-t/new-kospeech/kospeech/trainer/supervised_trainer.py", line 255, in _train_epoches output, loss, ctc_loss, cross_entropy_loss = self._model_forward( File "/home/ubuntu/rnn-t/new-kospeech/kospeech/trainer/supervised_trainer.py", line 431, in _model_forward outputs = model(inputs, input_lengths, targets, target_lengths) File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(*input, kwargs) File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 153, in forward return self.module(*inputs[0], *kwargs[0]) File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(input, kwargs) File "/home/ubuntu/rnn-t/new-kospeech/kospeech/models/rnnt/model.py", line 112, in forward return super().forward(inputs, input_lengths, targets, target_lengths) File "/home/ubuntu/rnn-t/new-kospeech/kospeech/models/model.py", line 262, in forward return self.joint(encoder_outputs, decoder_outputs) File "/home/ubuntu/rnn-t/new-kospeech/kospeech/models/model.py", line 236, in joint outputs = self.fc(outputs).log_softmax(dim=-1) RuntimeError: CUDA out of memory. Tried to allocate 6.64 GiB (GPU 0; 15.78 GiB total capacity; 9.77 GiB already allocated; 4.75 GiB free; 9.86 GiB reserved in total by PyTorch)

Hoping your reply. Thanks a lot for your help!