openvpi / DiffSinger

An advanced singing voice synthesis system with high fidelity, expressiveness, controllability and flexibility based on DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism
Apache License 2.0
2.64k stars 275 forks source link

[DONE]New AUX_Decoder/Backbone Network : LYNXNet #200

Open KakaruHayate opened 2 weeks ago

KakaruHayate commented 2 weeks ago

LYNXNet(Linear Gated Depthwise Separable Convolution Network)

refer to: https://github.com/CNChTu/Diffusion-SVC/blob/v2.0_dev/diffusion/naive_v2/model_conformer_naive.py https://github.com/CNChTu/Diffusion-SVC/blob/v2.0_dev/diffusion/naive_v2/naive_v2_diff.py

Refactoring the code, removing unnecessary parts, and adapting to OpenVPI/DiffSinger

The following parameters are recommended for use: Please note that when dim>512, Layernorm will be enabled by default to ensure stability during training

# LYNXNet-small
backbone_type: 'lynxnet'
residual_channels: 256
residual_layers: 3
dilation_cycle_length: 2

# LYNXNet-base
backbone_type: 'lynxnet'
residual_channels: 512
residual_layers: 6
dilation_cycle_length: 2

# LYNXNet-medium
backbone_type: 'lynxnet'
residual_channels: 768
residual_layers: 8
dilation_cycle_length: 2

# LYNXNet-large
backbone_type: 'lynxnet'
residual_channels: 1024
residual_layers: 12
dilation_cycle_length: 2

# LYNXNetDecoder-small
aux_decoder_arch: lynxnet
aux_decoder_args:
  num_channels: 256
  num_layers: 3
  kernel_size: 31
  dropout_rate: 0.0

# LYNXNetDecoder-base
aux_decoder_arch: lynxnet
aux_decoder_args:
  num_channels: 512
  num_layers: 6
  kernel_size: 31
  dropout_rate: 0.0

TIPS:You can control the style of the generated results by modifying the 'activation'(LYNXNet.py,Line:129),