Closed KIDxiaoyuan closed 2 years ago
Can you provide the versions of pytorch, torchvision, and torchaudio?
yes ,belows Python platform: Linux-5.4.0-90-generic-x86_64-with-debian-buster-sid Is CUDA available: True CUDA runtime version: 11.1.105 GPU models and configuration: GPU 0: GeForce RTX 3090 Nvidia driver version: 455.32.00 cuDNN version: Could not collect HIP runtime version: N/A MIOpen runtime version: N/A
Versions of relevant libraries: [pip3] mypy-extensions==0.4.3 [pip3] numpy==1.20.3 [pip3] numpydoc==1.1.0 [pip3] pytorch-pretrained-bert==0.6.2 [pip3] pytorch-transformers==1.2.0 [pip3] torch==1.10.0+cu113 [pip3] torchaudio==0.10.0+cu113 [pip3] torchfile==0.1.0 [pip3] torchnet==0.0.4 [pip3] torchsummary==1.5.1 [pip3] torchtext==0.10.1 [pip3] torchvision==0.11.1+cu113
thx
and i find that the problem is in vid_aud_lrw.py class MultiDataset(Dataset): def getitem(self, idx): line125:
if self.augmentations:
transform = nn.Sequential(
torchaudio.transforms.Spectrogram(win_length=400, hop_length=160, power=None, normalized=True),
# 100 fps (hop_length 10ms)
torchaudio.transforms.ComplexNorm(2),
torchaudio.transforms.MelScale(n_mels=self.num_mel_bins, sample_rate=16000),
torchaudio.transforms.AmplitudeToDB(),
CMVN(),
torchaudio.transforms.FrequencyMasking(freq_mask_param=10),
torchaudio.transforms.TimeMasking(time_mask_param=20)
)
else:
transform = nn.Sequential(
torchaudio.transforms.Spectrogram(win_length=400, hop_length=160, power=None, normalized=True), #100 fps (hop_length 10ms)
torch.Tensor(torchaudio.transforms.ComplexNorm(2)).view_as_complex().power(2),
torchaudio.transforms.MelScale(n_mels=self.num_mel_bins, sample_rate=16000),
torchaudio.transforms.AmplitudeToDB(),
CMVN()
)
in function torchaudio.transforms.MelScale(n_mels=self.num_mel_bins, sample_rate=16000), the input shape is[201,1]but expect is to multiplie (201x80) i think the input shape is [?*201] and the waring is in funtorchaudio.transforms.ComplexNorm(2) may be the wrong is in here . here are the info i known , hope you can give some suggestions.
I find the problem is the version mismatch of pytorch and torchaudio. Please change the original code of
transform = nn.Sequential(
torchaudio.transforms.Spectrogram(win_length=400, hop_length=160, power=None, normalized=True),
torchaudio.transforms.ComplexNorm(2),
...
)
to
transform = nn.Sequential(
torchaudio.transforms.Spectrogram(win_length=400, hop_length=160, power=2, normalized=True),
...
)
That is, omit the ComplexNorm(2) function and add the power=2 option in the Spectrogram() function.
now i chaged,but it print error in the same function
File "/home/server1/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, *kwargs) File "/home/server1/anaconda3/lib/python3.7/site-packages/torch/nn/modules/container.py", line 141, in forward input = module(input) File "/home/server1/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(input, **kwargs) File "/home/server1/anaconda3/lib/python3.7/site-packages/torchaudio/transforms.py", line 386, in forward mel_specgram = (torch.multiply(specgram.transpose(-1,-2) , self.fb)).transpose(-1, -2) RuntimeError: The size of tensor a (201) must match the size of tensor b (80) at non-singleton dimension 2 aud shape istorch.Size([1, 18560]) after torchaudio.transforms.Spectrogram(win_length=400, hop_length=160, power=2, normalized=True) shape is torch.Size([1, 201, 117]) then in torchaudio.transforms.MelScale(n_mels=self.num_mel_bins, sample_rate=16000) error happend...
if chage the augments as torchaudio.transforms.MelScale(n_mels=201, sample_rate=16000,n_stft=117) it can get the item success but not right in network ( /Visual-Audio-Memory-main/src/models/visual_front.py", line 141, in forward x = self.frontend(x) #B,C,T,H,W input, weight, bias, self.stride, self.padding, self.dilation, self.groups RuntimeError: Expected 5-dimensional input for 5-dimensional weight [64, 1, 5, 7, 7], but got 4-dimensional input of size [4, 1, 201, 116] instead ) so i gusee may to chage the torchaudio.transforms.Spectrogram and torchaudio.transforms.MelScale Parameters
The output of transformed mel-spectrogram (after passing transform(aud)) should have [1, 80, Time_length]. and I have confirmed the code below is working in Pytorch 1.10 and torchaudio 0.10.0.
if self.augmentations:
transform = nn.Sequential(
torchaudio.transforms.Spectrogram(win_length=400, hop_length=160, power=2, normalized=True),
torchaudio.transforms.MelScale(n_mels=self.num_mel_bins, sample_rate=16000, n_stft=201),
torchaudio.transforms.AmplitudeToDB(),
CMVN(),
torchaudio.transforms.FrequencyMasking(freq_mask_param=10),
torchaudio.transforms.TimeMasking(time_mask_param=20)
)
else:
transform = nn.Sequential(
torchaudio.transforms.Spectrogram(win_length=400, hop_length=160, power=2, normalized=True),
torchaudio.transforms.MelScale(n_mels=self.num_mel_bins, sample_rate=16000, n_stft=201),
torchaudio.transforms.AmplitudeToDB(),
CMVN()
)
or the same function with different code
if self.augmentations:
transform = nn.Sequential(
torchaudio.transforms.MelSpectrogram(win_length=400, hop_length=160, power=2, normalized=True,n_mels=self.num_mel_bins, sample_rate=16000),
torchaudio.transforms.AmplitudeToDB(),
CMVN(),
torchaudio.transforms.FrequencyMasking(freq_mask_param=10),
torchaudio.transforms.TimeMasking(time_mask_param=20)
)
else:
transform = nn.Sequential(
torchaudio.transforms.MelSpectrogram(win_length=400, hop_length=160, power=2, normalized=True,n_mels=self.num_mel_bins, sample_rate=16000),
torchaudio.transforms.AmplitudeToDB(),
CMVN()
)
okay thank you for your help ~ I am sorry get you some trouble and am greatful thanks for your help at last!
when i download the dataset and run the process
2021-11-26 12:45:58.018258: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0 Epoch [0/200] /home/server1/anaconda3/lib/python3.7/site-packages/torchaudio/transforms.py:918: UserWarning: torchaudio.transforms.ComplexNorm has been deprecated and will be removed from future release.Please convert the input Tensor to complex type with
train_net(args)
File "/home/server1/PythonCodes/XiaoYuan/code/Visual-Audio-Memory-main/main.py", line 139, in train_net
train(v_front, a_front, mem, back, train_data, args.epochs, optimizer=f_optimizer, scheduler=f_scheduler, args=args)
File "/home/server1/PythonCodes/XiaoYuan/code/Visual-Audio-Memory-main/main.py", line 183, in train
for i, batch in enumerate(dataloader):
File "/home/server1/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 521, in next
data = self._next_data()
File "/home/server1/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1203, in _next_data
return self._process_data(data)
File "/home/server1/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1229, in _process_data
data.reraise()
File "/home/server1/anaconda3/lib/python3.7/site-packages/torch/_utils.py", line 434, in reraise
raise exception
RuntimeError: Caught RuntimeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/server1/anaconda3/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
data = fetcher.fetch(index)
File "/home/server1/anaconda3/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/server1/anaconda3/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 49, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/data/PythonCodes/XiaoYuan/code/Visual-Audio-Memory-main/src/data/vid_aud_lrw.py", line 142, in getitem
spec = transform(aud) # 1, 80, time100 : C,F,T
File "/home/server1/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(input, *kwargs)
File "/home/server1/anaconda3/lib/python3.7/site-packages/torch/nn/modules/container.py", line 141, in forward
input = module(input)
File "/home/server1/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(input, **kwargs)
File "/home/server1/anaconda3/lib/python3.7/site-packages/torchaudio/transforms.py", line 386, in forward
mel_specgram = torch.matmul(specgram.transpose(-1, -2), self.fb).transpose(-1, -2)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (201x1 and 201x80)
torch.view_as_complex
then usetorch.abs
andtorch.angle
. Please refer to https://github.com/pytorch/audio/issues/1337 for more details about torchaudio's plan to migrate to native complex type. 'torchaudio.transforms.ComplexNorm has been deprecated ' /home/server1/anaconda3/lib/python3.7/site-packages/torchaudio/transforms.py:936: UserWarning: torchaudio.functional.functional.complex_norm has been deprecated and will be removed from 0.11 release. Please convert the input Tensor to complex type withtorch.view_as_complex
then usetorch.abs
. Please refer to https://github.com/pytorch/audio/issues/1337 for more details about torchaudio's plan to migrate to native complex type. return F.complex_norm(complex_tensor, self.power) /home/server1/anaconda3/lib/python3.7/site-packages/torchaudio/transforms.py:918: UserWarning: torchaudio.transforms.ComplexNorm has been deprecated and will be removed from future release.Please convert the input Tensor to complex type withtorch.view_as_complex
then usetorch.abs
andtorch.angle
. Please refer to https://github.com/pytorch/audio/issues/1337 for more details about torchaudio's plan to migrate to native complex type. 'torchaudio.transforms.ComplexNorm has been deprecated ' /home/server1/anaconda3/lib/python3.7/site-packages/torchaudio/transforms.py:918: UserWarning: torchaudio.transforms.ComplexNorm has been deprecated and will be removed from future release.Please convert the input Tensor to complex type withtorch.view_as_complex
then usetorch.abs
andtorch.angle
. Please refer to https://github.com/pytorch/audio/issues/1337 for more details about torchaudio's plan to migrate to native complex type. 'torchaudio.transforms.ComplexNorm has been deprecated ' /home/server1/anaconda3/lib/python3.7/site-packages/torchaudio/transforms.py:936: UserWarning: torchaudio.functional.functional.complex_norm has been deprecated and will be removed from 0.11 release. Please convert the input Tensor to complex type withtorch.view_as_complex
then usetorch.abs
. Please refer to https://github.com/pytorch/audio/issues/1337 for more details about torchaudio's plan to migrate to native complex type. return F.complex_norm(complex_tensor, self.power) /home/server1/anaconda3/lib/python3.7/site-packages/torchaudio/transforms.py:936: UserWarning: torchaudio.functional.functional.complex_norm has been deprecated and will be removed from 0.11 release. Please convert the input Tensor to complex type withtorch.view_as_complex
then usetorch.abs
. Please refer to https://github.com/pytorch/audio/issues/1337 for more details about torchaudio's plan to migrate to native complex type. return F.complex_norm(complex_tensor, self.power) Traceback (most recent call last): File "/home/server1/PythonCodes/XiaoYuan/code/Visual-Audio-Memory-main/main.py", line 434, in