openvpi / DiffSinger

An advanced singing voice synthesis system with high fidelity, expressiveness, controllability and flexibility based on DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism
Apache License 2.0
2.64k stars 275 forks source link

Inferencing with DDSP vocoder #79

Closed Mildemelwe closed 1 year ago

Mildemelwe commented 1 year ago

I trained a DDSP vocoder in torch 1.8.2, and I am getting an error when inferencing from .ds file. I followed all the documentation for training a DDSP vocoder and making it work with DiffSinger. NSF-HiFiGAN vocoder works fine with the same model and same .ds file.


| load phoneme set: ['A', 'AP', 'E', 'SP', 'Y', 'a', 'b', 'bj', 'c', 'ch', 'cl', 'd', 'dj', 'e', 'f', 'fj', 'g', 'gj', 'h', 'hj', 'i', 'j', 'k', 'kj', 'l', 'lj', 'm', 'mj', 'n', 'nj', 'o', 'p', 'pj', 'r', 'rj', 's', 'sh', 'shj', 'sj', 't', 'tj', 'u', 'v', 'vf', 'vj', 'y', 'z', 'zh', 'zj']
| load 'model' from 'checkpoints/crow/model_ckpt_steps_182000.ckpt'.
 [Loading] checkpoints/ddsp-crow/model_best-traced-torch1.8.2.jit
Processed 4 tokens: a tj e SP
Using manual phone duration
Using manual pitch curve
sample time step: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:01<00:00, 84.55it/s]
Traceback (most recent call last):
  File "main.py", line 202, in <module>
    infer_once(os.path.join(out, f'{name}{suffix}'), save_mel=args.mel)
  File "main.py", line 180, in infer_once
    seg_audio = infer_ins.infer_once(param)
  File "/home/kei/Desktop/diffsinger2/DiffSinger/basics/base_svs_infer.py", line 147, in infer_once
    output = self.forward_model(inp, return_mel=return_mel)
  File "/home/kei/Desktop/diffsinger2/DiffSinger/inference/ds_cascade.py", line 273, in forward_model
    wav_out = self.run_vocoder(mel_out, f0=f0_pred)
  File "/home/kei/Desktop/diffsinger2/DiffSinger/basics/base_svs_infer.py", line 71, in run_vocoder
    y = self.vocoder.spec2wav_torch(c, **kwargs)
  File "/home/kei/Desktop/diffsinger2/DiffSinger/src/vocoders/ddsp.py", line 137, in spec2wav_torch
    signal, _, (s_h, s_n) = self.model(mel.to(self.device), f0.to(self.device))
  File "/home/kei/miniconda3/envs/diffsinger2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
RuntimeError: Unsupported value kind: ComplexDouble```
yqzhishen commented 1 year ago

Your torch version is too old for complex dtype

Mildemelwe commented 1 year ago

What version of torch should I be using? I was using 1.8.2 because of the ONNX export working only with that.

yqzhishen commented 1 year ago

As for my experience, any newer PyTorch version is compatible. ONNX export do require 1.8 and I recommend creating an independent environment for that. Those code has been adapted to 1.13 in the new branch, and it will be available once that branch is prepared.

Mildemelwe commented 1 year ago

I'm using torch 2.0.0 now, and I'm getting a different error:


Traceback (most recent call last):
  File "main.py", line 202, in <module>
    infer_once(os.path.join(out, f'{name}{suffix}'), save_mel=args.mel)
  File "main.py", line 180, in infer_once
    seg_audio = infer_ins.infer_once(param)
  File "/home/kei/Desktop/diffsinger2/DiffSinger/basics/base_svs_infer.py", line 147, in infer_once
    output = self.forward_model(inp, return_mel=return_mel)
  File "/home/kei/Desktop/diffsinger2/DiffSinger/inference/ds_cascade.py", line 273, in forward_model
    wav_out = self.run_vocoder(mel_out, f0=f0_pred)
  File "/home/kei/Desktop/diffsinger2/DiffSinger/basics/base_svs_infer.py", line 71, in run_vocoder
    y = self.vocoder.spec2wav_torch(c, **kwargs)
  File "/home/kei/Desktop/diffsinger2/DiffSinger/src/vocoders/ddsp.py", line 137, in spec2wav_torch
    signal, _, (s_h, s_n) = self.model(mel.to(self.device), f0.to(self.device))
  File "/home/kei/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript, serialized code (most recent call last):
  File "code/__torch__/ddsp/vocoder.py", line 35, in forward
    signal1 = torch.slice(_11, 2, 0, -1)
    f0 = torch.permute(signal1, [0, 2, 1])
    _12 = torch.div(torch.to(f0, 7), sampling_rate)
          ~~~~~~~~~ <--- HERE
    x = torch.cumsum(_12, 1)
    x0 = torch.sub(x, torch.round(x))

Traceback of TorchScript, original code (most recent call last):
/home/kei/Desktop/ddsp/pc-ddsp/ddsp/vocoder.py(247): forward
/home/kei/.local/lib/python3.8/site-packages/torch/nn/modules/module.py(1488): _slow_forward
/home/kei/.local/lib/python3.8/site-packages/torch/nn/modules/module.py(1501): _call_impl
/home/kei/.local/lib/python3.8/site-packages/torch/jit/_trace.py(1056): trace_module
/home/kei/.local/lib/python3.8/site-packages/torch/jit/_trace.py(794): trace
export.py(76): main
export.py(91): <module>
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
yqzhishen commented 1 year ago

I think there is a bug here:

https://github.com/openvpi/DiffSinger/blob/d9f66c7961a4ee82049c24a99499a4f1966fc1e7/basics/base_svs_infer.py#L53-L55

torch.jit models seem to be only compatible with the device where they are on when they are saved. For this reason, DDSP is forced on CPU when exporting; and in previous code DDSP was also forced on CPU when inferencing in this repo. The above code moves it to CUDA if you have a NVIDIA GPU, which causes the problem. In this case, just leaving it on CPU can resolve everything.

This issue will not likely to be fixed on the main branch now, since this will cause conflict with the new branch. Anyway, thanks for your reporting, and I will fix this on the new branch.

Mildemelwe commented 1 year ago

I changed device to CPU and it's working now. Thank you.