yxlllc / DDSP-SVC

Real-time end-to-end singing voice conversion system based on DDSP (Differentiable Digital Signal Processing)
MIT License
1.86k stars 249 forks source link

加载底模进行训练时报错 #16

Closed fatinghenji closed 1 year ago

fatinghenji commented 1 year ago
PS G:\DDSP-SVC> python train.py -c configs/combsub.yaml
 > config: configs/combsub.yaml
 >    exp: exp/combsub-test
 [DDSP Model] Combtooth Subtractive Synthesiser
 [*] restoring model from exp/combsub-test\model_300000.pt
Traceback (most recent call last):
  File "train.py", line 68, in <module>
    initial_global_step, model, optimizer = utils.load_model(args.env.expdir, model, optimizer, device=args.device)
  File "G:\DDSP-SVC\logger\utils.py", line 119, in load_model
    model.load_state_dict(ckpt['model'])
  File "C:\Users\29099\.virtualenvs\DDSP-SVC-YOgpXN-h\lib\site-packages\torch\nn\modules\module.py", line 2041, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for CombSubFast:
        Unexpected key(s) in state_dict: "unit2ctrl.spk_embed.weight".

配置文件:

data:
  f0_extractor: 'parselmouth' # 'parselmouth', 'dio', 'harvest', or 'crepe'
  f0_min: 65 # about C2
  f0_max: 800 # about G5
  sampling_rate: 44100
  block_size: 512 # Equal to hop_length
  duration: 2 # Audio duration during training, must be less than the duration of the shortest audio clip
  encoder: 'hubertsoft' # 'hubertsoft', 'hubertbase' or 'contentvec'
  encoder_sample_rate: 16000
  encoder_hop_size: 320
  encoder_out_channels: 256
  encoder_ckpt: pretrain/hubert/hubert-soft-0d54a1f4.pt
  train_path: data/train # Create a folder named "audio" under this path and put the audio clip in it
  valid_path: data/val # Create a folder named "audio" under this path and put the audio clip in it
model:
  type: 'CombSubFast'
  n_spk: 1 # max number of different speakers
enhancer:
    type: 'nsf-hifigan'
    ckpt: 'pretrain/nsf_hifigan/model'
loss:
  fft_min: 256
  fft_max: 2048
  n_scale: 4 # rss kernel numbers
device: cuda
env:
  expdir: exp/combsub-test
  gpu_id: 0
train:
  num_workers: 0 # If your cpu and gpu are both very strong, set to 0 may be faster!
  batch_size: 24
  cache_all_data: true # Save Internal-Memory or Graphics-Memory if it is false, but may be slow
  cache_device: 'cuda' # Set to 'cuda' to cache the data into the Graphics-Memory, fastest speed for strong gpu
  cache_fp16: true
  epochs: 100000
  interval_log: 10
  interval_val: 2000
  lr: 0.0005
  weight_decay: 0
yxlllc commented 1 year ago

如果要从其他模型继续训练,配置文件里的模型参数需要保持一致,比如 n_spk 必须相等(说话人数量)