jik876 / hifi-gan

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
MIT License
1.94k stars 506 forks source link

mat1 and mat2 shapes cannot be multiplied (80x513 and 1x513) #162

Open Huan-phonetic opened 6 months ago

Huan-phonetic commented 6 months ago

Dear authors,

Maybe it seems novice but when I tried train.py with my dataset (and also LJ dataset), I found the y variable is having only one frame (1, 513). Any idea why this happens? My audios are longer than 2s at the least.

Traceback (most recent call last): File "F:\HiFiGAN\hifi-gan\train.py", line 271, in main() File "F:\HiFiGAN\hifi-gan\train.py", line 267, in main train(0, a, h) File "F:\HiFiGAN\hifi-gan\train.py", line 113, in train for i, batch in enumerate(train_loader): File "G:\Conda\envs\pytorch\Lib\site-packages\torch\utils\data\dataloader.py", line 633, in next data = self._next_data() ^^^^^^^^^^^^^^^^^ File "G:\Conda\envs\pytorch\Lib\site-packages\torch\utils\data\dataloader.py", line 1345, in _next_data return self._process_data(data) ^^^^^^^^^^^^^^^^^^^^^^^^ File "G:\Conda\envs\pytorch\Lib\site-packages\torch\utils\data\dataloader.py", line 1371, in _process_data data.reraise() File "G:\Conda\envs\pytorch\Lib\site-packages\torch_utils.py", line 644, in reraise raise exception RuntimeError: Caught RuntimeError in DataLoader worker process 0. Original Traceback (most recent call last): File "G:\Conda\envs\pytorch\Lib\site-packages\torch\utils\data_utils\worker.py", line 308, in _worker_loop data = fetcher.fetch(index) ^^^^^^^^^^^^^^^^^^^^ File "G:\Conda\envs\pytorch\Lib\site-packages\torch\utils\data_utils\fetch.py", line 51, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "G:\Conda\envs\pytorch\Lib\site-packages\torch\utils\data_utils\fetch.py", line 51, in data = [self.dataset[idx] for idx in possibly_batched_index]


File "F:\HiFiGAN\hifi-gan\meldataset.py", line 139, in __getitem__
mel = mel_spectrogram(audio, self.n_fft, self.num_mels,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "F:\HiFiGAN\hifi-gan\meldataset.py", line 69, in mel_spectrogram
spec = torch.matmul(mel_basis[str(fmax)+'_'+str(y.device)], spec)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: mat1 and mat2 shapes cannot be multiplied (80x513 and 1x513)
datouggg commented 3 months ago

这是一个版本问题,旧版本的torch.stft返回的张量最后一个维度大小是2,也就是是一个四维的张量,现版本能直接返回一个复数,张量维度是3,那么取幅值就不能用作者代码里的方法,你可以试试abs试试

chazarnik commented 3 months ago

Dear authors,

Maybe it seems novice but when I tried train.py with my dataset (and also LJ dataset), I found the y variable is having only one frame (1, 513). Any idea why this happens? My audios are longer than 2s at the least.

Traceback (most recent call last): File "F:\HiFiGAN\hifi-gan\train.py", line 271, in main() File "F:\HiFiGAN\hifi-gan\train.py", line 267, in main train(0, a, h) File "F:\HiFiGAN\hifi-gan\train.py", line 113, in train for i, batch in enumerate(train_loader): File "G:\Conda\envs\pytorch\Lib\site-packages\torch\utils\data\dataloader.py", line 633, in next data = self._next_data() ^^^^^^^^^^^^^^^^^ File "G:\Conda\envs\pytorch\Lib\site-packages\torch\utils\data\dataloader.py", line 1345, in _next_data return self._process_data(data) ^^^^^^^^^^^^^^^^^^^^^^^^ File "G:\Conda\envs\pytorch\Lib\site-packages\torch\utils\data\dataloader.py", line 1371, in _process_data data.reraise() File "G:\Conda\envs\pytorch\Lib\site-packages\torch_utils.py", line 644, in reraise raise exception RuntimeError: Caught RuntimeError in DataLoader worker process 0. Original Traceback (most recent call last): File "G:\Conda\envs\pytorch\Lib\site-packages\torch\utils\data_utils\worker.py", line 308, in _worker_loop data = fetcher.fetch(index) ^^^^^^^^^^^^^^^^^^^^ File "G:\Conda\envs\pytorch\Lib\site-packages\torch\utils\data_utils\fetch.py", line 51, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "G:\Conda\envs\pytorch\Lib\site-packages\torch\utils\data_utils\fetch.py", line 51, in data = [self.dataset[idx] for idx in possibly_batched_index]


File "F:\HiFiGAN\hifi-gan\meldataset.py", line 139, in **getitem**
mel = mel_spectrogram(audio, self.n_fft, self.num_mels,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "F:\HiFiGAN\hifi-gan\meldataset.py", line 69, in mel_spectrogram
spec = torch.matmul(mel_basis[str(fmax)+'_'+str(y.device)], spec)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: mat1 and mat2 shapes cannot be multiplied (80x513 and 1x513)

Hi, I found the error after fiddling around with the meldataset.py file. It's as @datouggg said but we need to clarify where the exact issue is.

In the mel_spectrogram() function you need to change the way the magnitude spectrogram is being retrieved. The issue stems from using a newer version of torch. The newer version of torch will require in torch.stft() to set retuurn_complex=True. Now the dimensionality of the returned tensor has changed to (num_batches, frequency_bins, temporal_bins) and the tensor contains complex values, in contrast to previous versions where the real and imaginary part were separate dimensions. Go the line where spec = torch.sqrt(spec.pow(2).sum(-1)+(1e-9)) and change it to spec = torch.abs(spec). This will solve the issue and you can run inference without problems.

Hope that helps and the issue can be closed.