[DONE]New AUX_Decoder/Backbone Network : LYNXNet

LYNXNet(Linear Gated Depthwise Separable Convolution Network)

refer to： https://github.com/CNChTu/Diffusion-SVC/blob/v2.0_dev/diffusion/naive_v2/model_conformer_naive.py https://github.com/CNChTu/Diffusion-SVC/blob/v2.0_dev/diffusion/naive_v2/naive_v2_diff.py

Refactoring the code, removing unnecessary parts, and adapting to OpenVPI/DiffSinger

The following parameters are recommended for use: Please note that when dim>512, Layernorm will be enabled by default to ensure stability during training

# LYNXNet-small
backbone_type: 'lynxnet'
residual_channels: 256
residual_layers: 3
dilation_cycle_length: 2

# LYNXNet-base
backbone_type: 'lynxnet'
residual_channels: 512
residual_layers: 6
dilation_cycle_length: 2

# LYNXNet-medium
backbone_type: 'lynxnet'
residual_channels: 768
residual_layers: 8
dilation_cycle_length: 2

# LYNXNet-large
backbone_type: 'lynxnet'
residual_channels: 1024
residual_layers: 12
dilation_cycle_length: 2

# LYNXNetDecoder-small
aux_decoder_arch: lynxnet
aux_decoder_args:
  num_channels: 256
  num_layers: 3
  kernel_size: 31
  dropout_rate: 0.0

# LYNXNetDecoder-base
aux_decoder_arch: lynxnet
aux_decoder_args:
  num_channels: 512
  num_layers: 6
  kernel_size: 31
  dropout_rate: 0.0

TIPS:You can control the style of the generated results by modifying the 'activation'(LYNXNet.py，Line:129),

'PReLU'(default) : Similar to WaveNet
'SiLU' : Voice will be more pronounced, not recommended for use under DDPM
'ReLU' : Contrary to 'SiLU', Voice will be weakened

openvpi / DiffSinger

[DONE]New AUX_Decoder/Backbone Network : LYNXNet #200