yistLin / dvector

Speaker embedding (d-vector) trained with GE2E loss
272 stars 46 forks source link

cannot reshape tensor of 0 elements into shape [-1, 0] #8

Open chonghaozhang1998 opened 2 years ago

chonghaozhang1998 commented 2 years ago

When the input tensor shape is [1, 800] or [1, 320] and When I use the following code

mel_tensor = wav2mel(wav_tensor, 16000) # 16000 is the sample rate

I met with the following error:

Traceback of TorchScript, serialized code (most recent call last): File "code/torch/data/wav2mel.py", line 20, in forward sample_rate: int) -> Tensor: wav_tensor0 = (self.sox_effects).forward(wav_tensor, sample_rate, ) mel_tensor = (self.log_melspectrogram).forward(wav_tensor0, )


    return mel_tensor
class SoxEffects(Module):
  File "code/__torch__/data/wav2mel.py", line 43, in forward
  def forward(self: __torch__.data.wav2mel.LogMelspectrogram,
    wav_tensor: Tensor) -> Tensor:
    _3 = (self.melspectrogram).forward(wav_tensor, )
          ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
    mel_tensor = torch.numpy_T(torch.squeeze(_3, 0))
    _4 = torch.clamp(mel_tensor, 1.0000000000000001e-09, None)
  File "code/__torch__/torchaudio/transforms.py", line 20, in forward
  def forward(self: __torch__.torchaudio.transforms.MelSpectrogram,
    waveform: Tensor) -> Tensor:
    specgram = (self.spectrogram).forward(waveform, )
                ~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
    mel_specgram = (self.mel_scale).forward(specgram, )
    return mel_specgram
  File "code/__torch__/torchaudio/transforms.py", line 41, in forward
    waveform: Tensor) -> Tensor:
    _0 = __torch__.torchaudio.functional.functional.spectrogram
    _1 = _0(waveform, 0, self.window, 400, 160, 400, 2., False, self.center, self.pad_mode, self.onesided, )
         ~~ <--- HERE
    return _1
class MelScale(Module):
  File "code/__torch__/torchaudio/functional/functional.py", line 18, in spectrogram
    waveform0 = waveform
  shape = torch.size(waveform0)
  waveform2 = torch.reshape(waveform0, [-1, shape[-1]])
              ~~~~~~~~~~~~~ <--- HERE
  spec_f = __torch__.torch.functional.stft(waveform2, n_fft, hop_length, win_length, window, center, pad_mode, False, onesided, True, )
  _0 = torch.slice(shape, 0, -1, 1)

Traceback of TorchScript, original code (most recent call last):
  File "/home/yist/.pyenv/versions/3.8.5/lib/python3.8/site-packages/torchaudio/transforms.py", line 96, in forward
            Fourier bins, and time is the number of window hops (n_frame).
        """
        return F.spectrogram(
               ~~~~~~~~~~~~~ <--- HERE
            waveform,
            self.pad,
  File "/home/yist/.pyenv/versions/3.8.5/lib/python3.8/site-packages/torchaudio/functional/functional.py", line 88, in spectrogram
    # pack batch
    shape = waveform.size()
    waveform = waveform.reshape(-1, shape[-1])
               ~~~~~~~~~~~~~~~~ <--- HERE

    # default values are consistent with librosa.core.spectrum._spectrogram
RuntimeError: cannot reshape tensor of 0 elements into shape [-1, 0] because the unspecified dimension size -1 can be any value and is ambiguous

How can I solve this problem?

MiniXC commented 2 years ago

Check the length of your audio files, this was only happening for very short clips for me.

med1844 commented 1 year ago

In my case, substituting wav2mel.pt with source code of class Wav2Mel in data/wav2mel.py and delete ["silence", ...] in self.effects solves the problem.