OutOfMemoryError - Githubissues

1048846280 commented 1 month ago

I have 12 GB GPU but I get this error. I came across this problem during training. Initially, the training was fine, but after 1000 steps, this error occurred.

And the sample rate is 16KHZ.

Steps : 985, Gen Loss: 1.028, Disc Loss: 0.007, Metric loss: 0.649, Magnitude Loss : 0.110, Phase Loss : 2.710, Complex Loss : 0.293, Time Loss : 0.123, s/b : 0.213 Steps : 990, Gen Loss: 0.493, Disc Loss: 0.002, Metric loss: 0.168, Magnitude Loss : 0.025, Phase Loss : 1.417, Complex Loss : 0.084, Time Loss : 0.097, s/b : 0.213 Steps : 995, Gen Loss: 0.779, Disc Loss: 0.001, Metric loss: 0.283, Magnitude Loss : 0.046, Phase Loss : 2.181, Complex Loss : 0.200, Time Loss : 0.146, s/b : 0.232 Steps : 1000, Gen Loss: 1.113, Disc Loss: 0.003, Metric loss: 0.666, Magnitude Loss : 0.134, Phase Loss : 2.843, Complex Loss : 0.368, Time Loss : 0.164, s/b : 0.206 Traceback (most recent call last): File "/media/MP-SENetmain/train.py", line 309, in main() File "/media/MP-SENetmain/train.py", line 305, in main train(0, a, h) File "/media/MP-SENetmain/train.py", line 233, in train mag_g, pha_g, com_g = generator(noisy_mag.to(device), noisy_pha.to(device)) File "/home/anaconda3/envs/MP-SENetmain/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, kwargs) File "/media/MP-SENetmain/models/generator.py", line 139, in forward x = self.TSConformeri File "/home/anaconda3/envs/MP-SENetmain/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, *kwargs) File "/media/MP-SENetmain/models/generator.py", line 113, in forward x = self.freq_conformer(x) + x File "/home/anaconda3/envs/MP-SENetmain/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(input, kwargs) File "/media/MP-SENetmain/models/conformer.py", line 73, in forward x = x + self.ccm(x) File "/home/anaconda3/envs/MP-SENetmain/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, kwargs) File "/media/MP-SENetmain/models/conformer.py", line 43, in forward return self.ccm(x) File "/home/anaconda3/envs/MP-SENetmain/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, *kwargs) File "/home/anaconda3/envs/MP-SENetmain/lib/python3.7/site-packages/torch/nn/modules/container.py", line 119, in forward input = module(input) File "/home/anaconda3/envs/MP-SENetmain/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(input, kwargs) File "/home/anaconda3/envs/MP-SENetmain/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 263, in forward return self._conv_forward(input, self.weight, self.bias) File "/home/anaconda3/envs/MP-SENetmain/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 260, in _conv_forward self.padding, self.dilation, self.groups) RuntimeError: CUDA out of memory. Tried to allocate 7.17 GiB (GPU 0; 10.75 GiB total capacity; 150.87 MiB already allocated; 7.19 GiB free; 1.53 GiB reserved in total by PyTorch)

vkeep commented 1 month ago

I also meet this problem during validation . my gpu is 24G, and when training, I reduce the bs = 2 , segment_size =24000, also similar OOM problem

vkeep commented 1 month ago

this problem occur when validation , you can modify validset = Dataset(validation_indexes.... in train.py

split = True, then the validation data will cut by segment_size, and the OOM problem solved

1048846280 commented 1 month ago

this problem occur when validation , you can modify validset = Dataset(validation_indexes.... in train.py

split = True, then the validation data will cut by segment_size, and the OOM problem solved

Thanks a million! this works now.

yxlu-0102 / MP-SENet

OutOfMemoryError #41