Open Huan-phonetic opened 6 months ago
这是一个版本问题,旧版本的torch.stft返回的张量最后一个维度大小是2,也就是是一个四维的张量,现版本能直接返回一个复数,张量维度是3,那么取幅值就不能用作者代码里的方法,你可以试试abs试试
Dear authors,
Maybe it seems novice but when I tried train.py with my dataset (and also LJ dataset), I found the y variable is having only one frame (1, 513). Any idea why this happens? My audios are longer than 2s at the least.
Traceback (most recent call last): File "F:\HiFiGAN\hifi-gan\train.py", line 271, in main() File "F:\HiFiGAN\hifi-gan\train.py", line 267, in main train(0, a, h) File "F:\HiFiGAN\hifi-gan\train.py", line 113, in train for i, batch in enumerate(train_loader): File "G:\Conda\envs\pytorch\Lib\site-packages\torch\utils\data\dataloader.py", line 633, in next data = self._next_data() ^^^^^^^^^^^^^^^^^ File "G:\Conda\envs\pytorch\Lib\site-packages\torch\utils\data\dataloader.py", line 1345, in _next_data return self._process_data(data) ^^^^^^^^^^^^^^^^^^^^^^^^ File "G:\Conda\envs\pytorch\Lib\site-packages\torch\utils\data\dataloader.py", line 1371, in _process_data data.reraise() File "G:\Conda\envs\pytorch\Lib\site-packages\torch_utils.py", line 644, in reraise raise exception RuntimeError: Caught RuntimeError in DataLoader worker process 0. Original Traceback (most recent call last): File "G:\Conda\envs\pytorch\Lib\site-packages\torch\utils\data_utils\worker.py", line 308, in _worker_loop data = fetcher.fetch(index) ^^^^^^^^^^^^^^^^^^^^ File "G:\Conda\envs\pytorch\Lib\site-packages\torch\utils\data_utils\fetch.py", line 51, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "G:\Conda\envs\pytorch\Lib\site-packages\torch\utils\data_utils\fetch.py", line 51, in data = [self.dataset[idx] for idx in possibly_batched_index]
File "F:\HiFiGAN\hifi-gan\meldataset.py", line 139, in **getitem** mel = mel_spectrogram(audio, self.n_fft, self.num_mels, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "F:\HiFiGAN\hifi-gan\meldataset.py", line 69, in mel_spectrogram spec = torch.matmul(mel_basis[str(fmax)+'_'+str(y.device)], spec) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ RuntimeError: mat1 and mat2 shapes cannot be multiplied (80x513 and 1x513)
Hi, I found the error after fiddling around with the meldataset.py file. It's as @datouggg said but we need to clarify where the exact issue is.
In the mel_spectrogram()
function you need to change the way the magnitude spectrogram is being retrieved. The issue stems from using a newer version of torch. The newer version of torch will require in torch.stft()
to set retuurn_complex=True
. Now the dimensionality of the returned tensor has changed to (num_batches, frequency_bins, temporal_bins) and the tensor contains complex values, in contrast to previous versions where the real and imaginary part were separate dimensions.
Go the line where spec = torch.sqrt(spec.pow(2).sum(-1)+(1e-9))
and change it to spec = torch.abs(spec)
. This will solve the issue and you can run inference without problems.
Hope that helps and the issue can be closed.
Dear authors,
Maybe it seems novice but when I tried train.py with my dataset (and also LJ dataset), I found the y variable is having only one frame (1, 513). Any idea why this happens? My audios are longer than 2s at the least.