ms-dot-k / Visual-Audio-Memory

PyTorch implementation of "Multi-modality Associative Bridging through Memory: Speech Sound Recollected from Face Video" (ICCV2021)
Other
19 stars 4 forks source link

RuntimeError: mat1 and mat2 shapes cannot be multiplied (201x1 and 201x80) #2

Closed KIDxiaoyuan closed 2 years ago

KIDxiaoyuan commented 2 years ago

when i download the dataset and run the process

2021-11-26 12:45:58.018258: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0 Epoch [0/200] /home/server1/anaconda3/lib/python3.7/site-packages/torchaudio/transforms.py:918: UserWarning: torchaudio.transforms.ComplexNorm has been deprecated and will be removed from future release.Please convert the input Tensor to complex type with torch.view_as_complex then use torch.abs and torch.angle. Please refer to https://github.com/pytorch/audio/issues/1337 for more details about torchaudio's plan to migrate to native complex type. 'torchaudio.transforms.ComplexNorm has been deprecated ' /home/server1/anaconda3/lib/python3.7/site-packages/torchaudio/transforms.py:936: UserWarning: torchaudio.functional.functional.complex_norm has been deprecated and will be removed from 0.11 release. Please convert the input Tensor to complex type with torch.view_as_complex then use torch.abs. Please refer to https://github.com/pytorch/audio/issues/1337 for more details about torchaudio's plan to migrate to native complex type. return F.complex_norm(complex_tensor, self.power) /home/server1/anaconda3/lib/python3.7/site-packages/torchaudio/transforms.py:918: UserWarning: torchaudio.transforms.ComplexNorm has been deprecated and will be removed from future release.Please convert the input Tensor to complex type with torch.view_as_complex then use torch.abs and torch.angle. Please refer to https://github.com/pytorch/audio/issues/1337 for more details about torchaudio's plan to migrate to native complex type. 'torchaudio.transforms.ComplexNorm has been deprecated ' /home/server1/anaconda3/lib/python3.7/site-packages/torchaudio/transforms.py:918: UserWarning: torchaudio.transforms.ComplexNorm has been deprecated and will be removed from future release.Please convert the input Tensor to complex type with torch.view_as_complex then use torch.abs and torch.angle. Please refer to https://github.com/pytorch/audio/issues/1337 for more details about torchaudio's plan to migrate to native complex type. 'torchaudio.transforms.ComplexNorm has been deprecated ' /home/server1/anaconda3/lib/python3.7/site-packages/torchaudio/transforms.py:936: UserWarning: torchaudio.functional.functional.complex_norm has been deprecated and will be removed from 0.11 release. Please convert the input Tensor to complex type with torch.view_as_complex then use torch.abs. Please refer to https://github.com/pytorch/audio/issues/1337 for more details about torchaudio's plan to migrate to native complex type. return F.complex_norm(complex_tensor, self.power) /home/server1/anaconda3/lib/python3.7/site-packages/torchaudio/transforms.py:936: UserWarning: torchaudio.functional.functional.complex_norm has been deprecated and will be removed from 0.11 release. Please convert the input Tensor to complex type with torch.view_as_complex then use torch.abs. Please refer to https://github.com/pytorch/audio/issues/1337 for more details about torchaudio's plan to migrate to native complex type. return F.complex_norm(complex_tensor, self.power) Traceback (most recent call last): File "/home/server1/PythonCodes/XiaoYuan/code/Visual-Audio-Memory-main/main.py", line 434, in train_net(args) File "/home/server1/PythonCodes/XiaoYuan/code/Visual-Audio-Memory-main/main.py", line 139, in train_net train(v_front, a_front, mem, back, train_data, args.epochs, optimizer=f_optimizer, scheduler=f_scheduler, args=args) File "/home/server1/PythonCodes/XiaoYuan/code/Visual-Audio-Memory-main/main.py", line 183, in train for i, batch in enumerate(dataloader): File "/home/server1/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 521, in next data = self._next_data() File "/home/server1/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1203, in _next_data return self._process_data(data) File "/home/server1/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1229, in _process_data data.reraise() File "/home/server1/anaconda3/lib/python3.7/site-packages/torch/_utils.py", line 434, in reraise raise exception RuntimeError: Caught RuntimeError in DataLoader worker process 0. Original Traceback (most recent call last): File "/home/server1/anaconda3/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop data = fetcher.fetch(index) File "/home/server1/anaconda3/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/server1/anaconda3/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 49, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/data/PythonCodes/XiaoYuan/code/Visual-Audio-Memory-main/src/data/vid_aud_lrw.py", line 142, in getitem spec = transform(aud) # 1, 80, time100 : C,F,T File "/home/server1/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(input, *kwargs) File "/home/server1/anaconda3/lib/python3.7/site-packages/torch/nn/modules/container.py", line 141, in forward input = module(input) File "/home/server1/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(input, **kwargs) File "/home/server1/anaconda3/lib/python3.7/site-packages/torchaudio/transforms.py", line 386, in forward mel_specgram = torch.matmul(specgram.transpose(-1, -2), self.fb).transpose(-1, -2) RuntimeError: mat1 and mat2 shapes cannot be multiplied (201x1 and 201x80)

ms-dot-k commented 2 years ago

Can you provide the versions of pytorch, torchvision, and torchaudio?

KIDxiaoyuan commented 2 years ago

yes ,belows Python platform: Linux-5.4.0-90-generic-x86_64-with-debian-buster-sid Is CUDA available: True CUDA runtime version: 11.1.105 GPU models and configuration: GPU 0: GeForce RTX 3090 Nvidia driver version: 455.32.00 cuDNN version: Could not collect HIP runtime version: N/A MIOpen runtime version: N/A

Versions of relevant libraries: [pip3] mypy-extensions==0.4.3 [pip3] numpy==1.20.3 [pip3] numpydoc==1.1.0 [pip3] pytorch-pretrained-bert==0.6.2 [pip3] pytorch-transformers==1.2.0 [pip3] torch==1.10.0+cu113 [pip3] torchaudio==0.10.0+cu113 [pip3] torchfile==0.1.0 [pip3] torchnet==0.0.4 [pip3] torchsummary==1.5.1 [pip3] torchtext==0.10.1 [pip3] torchvision==0.11.1+cu113

thx

KIDxiaoyuan commented 2 years ago

and i find that the problem is in vid_aud_lrw.py class MultiDataset(Dataset): def getitem(self, idx): line125:

Audio

    if self.augmentations:
        transform = nn.Sequential(
            torchaudio.transforms.Spectrogram(win_length=400, hop_length=160, power=None, normalized=True),
            # 100 fps (hop_length 10ms)
            torchaudio.transforms.ComplexNorm(2),
            torchaudio.transforms.MelScale(n_mels=self.num_mel_bins, sample_rate=16000),
            torchaudio.transforms.AmplitudeToDB(),
            CMVN(),
            torchaudio.transforms.FrequencyMasking(freq_mask_param=10),
            torchaudio.transforms.TimeMasking(time_mask_param=20)
        )
    else:
        transform = nn.Sequential(
            torchaudio.transforms.Spectrogram(win_length=400, hop_length=160, power=None, normalized=True), #100 fps (hop_length 10ms)
            torch.Tensor(torchaudio.transforms.ComplexNorm(2)).view_as_complex().power(2),
            torchaudio.transforms.MelScale(n_mels=self.num_mel_bins, sample_rate=16000),
            torchaudio.transforms.AmplitudeToDB(),
            CMVN()
        )

in function torchaudio.transforms.MelScale(n_mels=self.num_mel_bins, sample_rate=16000), the input shape is[201,1]but expect is to multiplie (201x80) i think the input shape is [?*201] and the waring is in funtorchaudio.transforms.ComplexNorm(2) may be the wrong is in here . here are the info i known , hope you can give some suggestions.

ms-dot-k commented 2 years ago

I find the problem is the version mismatch of pytorch and torchaudio. Please change the original code of

            transform = nn.Sequential(
                torchaudio.transforms.Spectrogram(win_length=400, hop_length=160, power=None, normalized=True),
                torchaudio.transforms.ComplexNorm(2),
                ...
            )

to

            transform = nn.Sequential(
                torchaudio.transforms.Spectrogram(win_length=400, hop_length=160, power=2, normalized=True),
                ...
            )

That is, omit the ComplexNorm(2) function and add the power=2 option in the Spectrogram() function.

KIDxiaoyuan commented 2 years ago

now i chaged,but it print error in the same function

File "/home/server1/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, *kwargs) File "/home/server1/anaconda3/lib/python3.7/site-packages/torch/nn/modules/container.py", line 141, in forward input = module(input) File "/home/server1/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(input, **kwargs) File "/home/server1/anaconda3/lib/python3.7/site-packages/torchaudio/transforms.py", line 386, in forward mel_specgram = (torch.multiply(specgram.transpose(-1,-2) , self.fb)).transpose(-1, -2) RuntimeError: The size of tensor a (201) must match the size of tensor b (80) at non-singleton dimension 2 aud shape istorch.Size([1, 18560]) after torchaudio.transforms.Spectrogram(win_length=400, hop_length=160, power=2, normalized=True) shape is torch.Size([1, 201, 117]) then in torchaudio.transforms.MelScale(n_mels=self.num_mel_bins, sample_rate=16000) error happend...

KIDxiaoyuan commented 2 years ago

if chage the augments as torchaudio.transforms.MelScale(n_mels=201, sample_rate=16000,n_stft=117) it can get the item success but not right in network ( /Visual-Audio-Memory-main/src/models/visual_front.py", line 141, in forward x = self.frontend(x) #B,C,T,H,W input, weight, bias, self.stride, self.padding, self.dilation, self.groups RuntimeError: Expected 5-dimensional input for 5-dimensional weight [64, 1, 5, 7, 7], but got 4-dimensional input of size [4, 1, 201, 116] instead ) so i gusee may to chage the torchaudio.transforms.Spectrogram and torchaudio.transforms.MelScale Parameters

ms-dot-k commented 2 years ago

The output of transformed mel-spectrogram (after passing transform(aud)) should have [1, 80, Time_length]. and I have confirmed the code below is working in Pytorch 1.10 and torchaudio 0.10.0.

        if self.augmentations:
            transform = nn.Sequential(
                torchaudio.transforms.Spectrogram(win_length=400, hop_length=160, power=2, normalized=True),
                torchaudio.transforms.MelScale(n_mels=self.num_mel_bins, sample_rate=16000, n_stft=201),
                torchaudio.transforms.AmplitudeToDB(),
                CMVN(),
                torchaudio.transforms.FrequencyMasking(freq_mask_param=10),
                torchaudio.transforms.TimeMasking(time_mask_param=20)
            )
        else:
            transform = nn.Sequential(
                torchaudio.transforms.Spectrogram(win_length=400, hop_length=160, power=2, normalized=True),
                torchaudio.transforms.MelScale(n_mels=self.num_mel_bins, sample_rate=16000, n_stft=201),
                torchaudio.transforms.AmplitudeToDB(),
                CMVN()
            )

or the same function with different code

        if self.augmentations:
            transform = nn.Sequential(
                torchaudio.transforms.MelSpectrogram(win_length=400, hop_length=160, power=2, normalized=True,n_mels=self.num_mel_bins, sample_rate=16000),
                torchaudio.transforms.AmplitudeToDB(),
                CMVN(),
                torchaudio.transforms.FrequencyMasking(freq_mask_param=10),
                torchaudio.transforms.TimeMasking(time_mask_param=20)
            )
        else:
            transform = nn.Sequential(
                torchaudio.transforms.MelSpectrogram(win_length=400, hop_length=160, power=2, normalized=True,n_mels=self.num_mel_bins, sample_rate=16000),
                torchaudio.transforms.AmplitudeToDB(),
                CMVN()
            )
KIDxiaoyuan commented 2 years ago

okay thank you for your help ~ I am sorry get you some trouble and am greatful thanks for your help at last!