open-mmlab / Amphion

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
https://openhlt.github.io/amphion/
MIT License
7.75k stars 589 forks source link

[Help]: Requesting some guidance / documentation on choosing appropriate parameters for mssbcqt #351

Open codename0og opened 1 week ago

codename0og commented 1 week ago

Hello! I'd be very glad if I could get some more information how to adapt mssbcqt discriminator for 48khz audio.

Lately I've been trying to improve the current architecture of RVC ( retrieval-based-voice-conversion ) by adopting ms-sb-cqt and ms-stft discriminators however from what I can see, it was tested on ( and supposedly the config is for ) 24khz audio. Essentially, I am interested in receiving some guidance on how to properly decide on params for cqt.:

        filters=32,
        max_filters=1024,
        filters_scale=1,
        dilations=[1, 2, 4],
        in_channels=1,
        out_channels=1,
        hop_lengths= [512, 256, 256],
        n_octaves=[9, 9, 9],
        bins_per_octaves=[24, 36, 48],  

For more details, this is the current config I use for training pretrained models for rvc:

   },
  "data": {
    "max_wav_value": 32768.0,
    "sampling_rate": 48000,
    "filter_length": 2048,
    "hop_length": 480,
    "win_length": 2048,
    "n_mel_channels": 128,
    "mel_fmin": 0.0,
    "mel_fmax": null
  },
  "model": {
    "inter_channels": 192,
    "hidden_channels": 192,
    "filter_channels": 768,
    "n_heads": 2,
    "n_layers": 6,
    "kernel_size": 3,
    "p_dropout": 0,
    "resblock": "1",
    "resblock_kernel_sizes": [3,7,11],
    "resblock_dilation_sizes": [[1,3,5], [1,3,5], [1,3,5]],
    "upsample_rates": [12,10,2,2],
    "upsample_initial_channel": 512,
    "upsample_kernel_sizes": [24,20,4,4],
    "use_spectral_norm": false,
    "gin_channels": 256,
    "spk_embed_dim": 109
  }
}

As an important note: I intend to pair mssbcqt / msstft combo along with the existing MultiPeriodDiscriminator used in RVC. Kindly thank you in advance!

codename0og commented 1 week ago

Bumping up.