Open 1048846280 opened 1 month ago
I also meet this problem during validation . my gpu is 24G, and when training, I reduce the bs = 2 , segment_size =24000, also similar OOM problem
this problem occur when validation , you can modify validset = Dataset(validation_indexes.... in train.py
split = True, then the validation data will cut by segment_size, and the OOM problem solved
this problem occur when validation , you can modify validset = Dataset(validation_indexes.... in train.py
split = True, then the validation data will cut by segment_size, and the OOM problem solved
Thanks a million! this works now.
I have 12 GB GPU but I get this error. I came across this problem during training. Initially, the training was fine, but after 1000 steps, this error occurred.
And the sample rate is 16KHZ.
Steps : 985, Gen Loss: 1.028, Disc Loss: 0.007, Metric loss: 0.649, Magnitude Loss : 0.110, Phase Loss : 2.710, Complex Loss : 0.293, Time Loss : 0.123, s/b : 0.213 Steps : 990, Gen Loss: 0.493, Disc Loss: 0.002, Metric loss: 0.168, Magnitude Loss : 0.025, Phase Loss : 1.417, Complex Loss : 0.084, Time Loss : 0.097, s/b : 0.213 Steps : 995, Gen Loss: 0.779, Disc Loss: 0.001, Metric loss: 0.283, Magnitude Loss : 0.046, Phase Loss : 2.181, Complex Loss : 0.200, Time Loss : 0.146, s/b : 0.232 Steps : 1000, Gen Loss: 1.113, Disc Loss: 0.003, Metric loss: 0.666, Magnitude Loss : 0.134, Phase Loss : 2.843, Complex Loss : 0.368, Time Loss : 0.164, s/b : 0.206 Traceback (most recent call last): File "/media/MP-SENetmain/train.py", line 309, in
main()
File "/media/MP-SENetmain/train.py", line 305, in main
train(0, a, h)
File "/media/MP-SENetmain/train.py", line 233, in train
mag_g, pha_g, com_g = generator(noisy_mag.to(device), noisy_pha.to(device))
File "/home/anaconda3/envs/MP-SENetmain/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, kwargs)
File "/media/MP-SENetmain/models/generator.py", line 139, in forward
x = self.TSConformeri
File "/home/anaconda3/envs/MP-SENetmain/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, *kwargs)
File "/media/MP-SENetmain/models/generator.py", line 113, in forward
x = self.freq_conformer(x) + x
File "/home/anaconda3/envs/MP-SENetmain/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(input, kwargs)
File "/media/MP-SENetmain/models/conformer.py", line 73, in forward
x = x + self.ccm(x)
File "/home/anaconda3/envs/MP-SENetmain/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, kwargs)
File "/media/MP-SENetmain/models/conformer.py", line 43, in forward
return self.ccm(x)
File "/home/anaconda3/envs/MP-SENetmain/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, *kwargs)
File "/home/anaconda3/envs/MP-SENetmain/lib/python3.7/site-packages/torch/nn/modules/container.py", line 119, in forward
input = module(input)
File "/home/anaconda3/envs/MP-SENetmain/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(input, kwargs)
File "/home/anaconda3/envs/MP-SENetmain/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 263, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/home/anaconda3/envs/MP-SENetmain/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 260, in _conv_forward
self.padding, self.dilation, self.groups)
RuntimeError: CUDA out of memory. Tried to allocate 7.17 GiB (GPU 0; 10.75 GiB total capacity; 150.87 MiB already allocated; 7.19 GiB free; 1.53 GiB reserved in total by PyTorch)