I'm trying to run inference using a pretrained diffwave model on the output of a SepFormer model (separating a 2 speaker mixture). Creating the mel spectrogram and calling predict
Traceback (most recent call last):
File "path/test.py", line 29, in <module>
gen_estimation, _ = diffwave_predict(enlarged_spectrogram, 'diffwave-weights-902319.pt', base_params, fast_sampling=True,
File "path/diffwave/inference.py", line 81, in predict
audio = c1 * (audio - c2 * model(audio, spectrogram, torch.tensor([T[n]], device=audio.device)).squeeze(1))
File "/home/eitzo/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "path/diffwave/model.py", line 152, in forward
diffusion_step = self.diffusion_embedding(diffusion_step)
File "/home/eitzo/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "path/diffwave/model.py", line 50, in forward
x = self._lerp_embedding(diffusion_step)
File "path/diffwave/model.py", line 62, in _lerp_embedding
return low + (high - low) * (t - low_idx)
RuntimeError: The size of tensor a (128) must match the size of tensor b (166) at non-singleton dimension 3
Process finished with exit code 1
I'm trying to run inference using a pretrained diffwave model on the output of a SepFormer model (separating a 2 speaker mixture). Creating the mel spectrogram and calling predict
leads to the following error:
Any ideas where I went wrong ? Thanks in advance!