openvpi / DiffSinger

An advanced singing voice synthesis system with high fidelity, expressiveness, controllability and flexibility based on DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism
Apache License 2.0
2.62k stars 275 forks source link

Custom Trained DiffSinger Render Failed #186

Closed Alistair-zhong closed 2 months ago

Alistair-zhong commented 2 months ago

DiffSinger Ver: 76afe57 OpenUtau Ver: 0.1.429.0 OS: Windows 10

Config generated by Exporter is not right, there is something wrong in it, such as max_depth is decimal 0.6, but right one should be 300 which is equal K_step in the training config file. So I change something based on generated config file

Error

image_2024_04_20T22_42_48_086Z

Acoustic dsconfig

phonemes: phonemes.txt
acoustic: acoustic.onnx
vocoder: nsf_hifigan

augmentation_args:
  random_pitch_shifting:
    range:
    - -5.0
    - 5.0
use_key_shift_embed: true
use_speed_embed: true
# 以下是声学模型支持的唱法参数的声明,请务必按声学模型实际支持情况填写
use_energy_embed: false       # 声学模型是否支持能量参数
use_breathiness_embed: false  # 声学模型是否支持气声参数
use_voicing_embed: false      # 声学模型是否支持发声参数
use_tension_embed: false      # 声学模型是否支持张力参数

# 如果你的声学模型使用浅扩散,才需要以下两行
use_shallow_diffusion: true
max_depth: 300 # 训练音源时的K_step

use_continuous_acceleration: true
use_variable_depth: true
# max_depth: 0.6
sample_rate: 44100
hop_size: 512
win_size: 2048
fft_size: 2048
num_mel_bins: 128
mel_fmin: 40
mel_fmax: 16000
mel_base: e
mel_scale: slaney

Acoustic model training config

base_config: configs/acoustic.yaml

raw_data_dir:
  - data/english_dataset/raw
speakers:
  - english_dataset
spk_ids: []
test_prefixes:
  - 04-15-english-song1_seg003
  - 04-15-english-song1_seg010
  - 04-15-english-song1_seg017
  - 0416-english-song2_seg003
  - 0416-english-song2_seg0010
dictionary: dictionaries/tgm_sofa_dict.txt
binary_data_dir: data/english_dataset/binary
binarization_args:
  num_workers: 2
pe: parselmouth
pe_ckpt: null
vocoder: NsfHifiGAN
vocoder_ckpt: checkpoints/nsf_hifigan/model.ckpt

use_spk_id: false
num_spk: 1

# NOTICE: before enabling variance embeddings, please read the docs at
# https://github.com/openvpi/DiffSinger/tree/main/docs/BestPractices.md#choosing-variance-parameters
use_energy_embed: false
use_breathiness_embed: false
use_voicing_embed: false
use_tension_embed: false

use_key_shift_embed: true
use_speed_embed: true

augmentation_args:
  random_pitch_shifting:
    enabled: true
    range: [-5., 5.]
    scale: 0.75
  fixed_pitch_shifting:
    enabled: false
    targets: [-5., 5.]
    scale: 0.5
  random_time_stretching:
    enabled: true
    range: [0.5, 2.]
    scale: 0.75

residual_channels: 512
residual_layers: 20

# shallow diffusion
diffusion_type: reflow
use_shallow_diffusion: true
T_start: 0.4
T_start_infer: 0.4
K_step: 300
K_step_infer: 300
shallow_diffusion_args:
  train_aux_decoder: true
  train_diffusion: true
  val_gt_start: false
  aux_decoder_arch: convnext
  aux_decoder_args:
    num_channels: 512
    num_layers: 6
    kernel_size: 7
    dropout_rate: 0.1
  aux_decoder_grad: 0.1
lambda_aux_mel_loss: 0.2

optimizer_args:
  lr: 0.0006
lr_scheduler_args:
  scheduler_cls: torch.optim.lr_scheduler.StepLR
  step_size: 10000
  gamma: 0.75
max_batch_frames: 50000
max_batch_size: 64
max_updates: 160000

num_valid_plots: 5
val_with_vocoder: true
val_check_interval: 1000
num_ckpt_keep: 5
permanent_ckpt_start: 2000
permanent_ckpt_interval: 20000
pl_trainer_devices: 'auto'
pl_trainer_precision: '16-mixed'

Thank you for helping

Alistair-zhong commented 2 months ago

Resolved after install latest OpenUtau Beta version 0.1.443 Beta