Open dillfrescott opened 1 year ago
I think it might have something to do with training with 44.1khz audio
I'm not quite sure about what caused this problem, and I haven't seen errors like this in our develop group. But i can give you some suggestions. First, you may try a new environment of python 3.8 to see if it will work. Second, 44.1khz is a experimental frequency right now, and the vocoder of 44.1khz is not released yet. If anybody tries to train a model with this parameter, it may cause errors.
Ah, gotcha. Ill have to wait to use the 44.1khz then :/
@prophesier I'm still getting the exact same error. Are we any closer to figuring out why this is happening?
same here!
RuntimeError Traceback (most recent call last)
9 frames /content/diff-svc/infer.py in run_clip(svc_model, key, acc, use_pe, use_crepe, thre, use_gt_mel, add_noise_step, project_name, f_name, file_path, out_path, slice_db, **kwargs) 57 np.zeros(length)) 58 else: ---> 59 _f0_tst, _f0_pred, _audio = svc_model.infer(raw_path, key=key, acc=acc, use_pe=use_pe, use_crepe=use_crepe, 60 thre=thre, use_gt_mel=use_gt_mel, add_noise_step=add_noise_step) 61 fix_audio = np.zeros(length)
/content/diff-svc/infer_tools/infer_tool.py in infer(self, in_path, key, acc, use_pe, use_crepe, thre, singer, **kwargs) 165 else: 166 batch['f0_pred'] = outputs.get('f0_denorm') --> 167 return self.after_infer(batch, singer, in_path) 168 169 @timeit
/content/diff-svc/infer_tools/infer_tool.py in run(*args, kwargs) 60 def run(*args, *kwargs): 61 t = time.time() ---> 62 res = func(args, kwargs) 63 print('executing \'%s\' costed %.3fs' % (func.name, time.time() - t)) 64 return res
/content/diff-svc/infer_tools/infer_tool.py in after_infer(self, prediction, singer, in_path) 197 np.save(mel_path, mel_pred) 198 np.save(f0_path, f0_pred) --> 199 wav_pred = self.vocoder.spec2wav(mel_pred, f0=f0_pred) 200 return f0_gt, f0_pred, wav_pred 201
/content/diff-svc/network/vocoders/hifigan.py in spec2wav(self, mel, **kwargs) 68 if f0 is not None and hparams.get('use_nsf'): 69 f0 = torch.FloatTensor(f0[None, :]).to(device) ---> 70 y = self.model(c, f0).view(-1) 71 else: 72 y = self.model(c).view(-1)
/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, *kwargs) 1128 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks 1129 or _global_forward_hooks or _global_forward_pre_hooks): -> 1130 return forward_call(input, **kwargs) 1131 # Do not call functions when jit is used 1132 full_backward_hooks, non_full_backward_hooks = [], []
/content/diff-svc/modules/hifigan/hifigan.py in forward(self, x, f0) 145 if f0 is not None: 146 # harmonic-source signal, noise-source signal, uv flag --> 147 f0 = self.f0_upsamp(f0[:, None]).transpose(1, 2) 148 har_source, noi_source, uv = self.m_source(f0) 149 har_source = har_source.transpose(1, 2)
/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, *kwargs) 1128 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks 1129 or _global_forward_hooks or _global_forward_pre_hooks): -> 1130 return forward_call(input, **kwargs) 1131 # Do not call functions when jit is used 1132 full_backward_hooks, non_full_backward_hooks = [], []
/usr/local/lib/python3.8/dist-packages/torch/nn/modules/upsampling.py in forward(self, input) 151 152 def forward(self, input: Tensor) -> Tensor: --> 153 return F.interpolate(input, self.size, self.scale_factor, self.mode, self.align_corners, 154 recompute_scale_factor=self.recompute_scale_factor) 155
/usr/local/lib/python3.8/dist-packages/torch/nn/functional.py in interpolate(input, size, scale_factor, mode, align_corners, recompute_scale_factor, antialias) 3906 3907 if input.dim() == 3 and mode == "nearest": -> 3908 return torch._C._nn.upsample_nearest1d(input, output_size, scale_factors) 3909 if input.dim() == 4 and mode == "nearest": 3910 return torch._C._nn.upsample_nearest2d(input, output_size, scale_factors)
RuntimeError: Input and output sizes should be greater than 0, but got input (W: 0) and output (W: 0)
(i am training in 24 Khz)
Yesterday I was inferencing fine, but now it just throws this error: