sh-lee-prml / BigVGAN

Unofficial pytorch implementation of BigVGAN: A Universal Neural Vocoder with Large-Scale Training
MIT License
130 stars 16 forks source link

Test with MEL spectrogram #8

Closed skol101 closed 2 years ago

skol101 commented 2 years ago

Though you provide code in issue #1 , it doesn't work

This block change (along with eps 1e-9)

mel = spec_to_mel_torch(
                spec,
                hps.data.filter_length,
                hps.data.n_mel_channels,
                hps.data.sampling_rate,
                hps.data.mel_fmin,
                hps.data.mel_fmax)

        mel_for_loss = spec_to_mel_torch(
                spec,
                hps.data.filter_length,
                hps.data.n_mel_channels,
                hps.data.sampling_rate,
                hps.data.mel_fmin,
                hps.data.mel_fmax_for_loss)

leads to error

RuntimeError: Given groups=1, weight of size [512, 513, 7], expected input[16, 80, 32] to have 513 channels, but got 80 channels instead

My vctk_bigvgan.json

{
  "train": {
    "log_interval": 200,
    "eval_interval": 1000,
    "seed": 1234,
    "epochs": 20000,
    "learning_rate": 1e-4,
    "betas": [0.8, 0.99],
    "eps": 1e-9,
    "batch_size":16,
    "fp16_run": true,
    "lr_decay": 0.999875,
    "segment_size": 8192,
    "init_lr_ratio": 1,
    "warmup_epochs": 0,
    "c_mel": 45

  },
  "data": {
    "training_files": "./dataset/VCTK-Corpus/preprocessed_npz",
    "validation_files":"./dataset/VCTK-Corpus/preprocessed_npz",
    "text_cleaners":["english_cleaners2"],
    "max_wav_value": 32768.0,
    "sampling_rate": 22050,
    "filter_length": 1024,
    "hop_length": 256,
    "win_length": 1024,
    "n_mel_channels": 80,
    "mel_fmin": 0.0,
    "mel_fmax": 12000,
    "mel_fmax_for_loss": null,
    "add_blank": true,
    "n_speakers": 43,
    "cleaned_text": true,
    "aug_rate": 1.0,
    "top_db": 20
  },
  "model": {
    "p_dropout": 0.1,
    "resblock_kernel_sizes": [3,7,11],
    "resblock_dilation_sizes": [[1,3,5], [1,3,5], [1,3,5]],
    "upsample_rates": [8,8,2,2],
    "upsample_initial_channel": 512,
    "upsample_kernel_sizes": [16,16,4,4],
    "use_spectral_norm": false

  }
}

IT'd be nice if it was actually possible to test with MEL spectrogram.

sh-lee-prml commented 2 years ago
    net_g = SynthesizerTrn(

        hps.data.filter_length // 2 + 1,
        hps.train.segment_size // hps.data.hop_length,
        **hps.model, rank=rank).cuda(rank)

Change input size for your model

    hps.data.filter_length // 2 + 1, --> 80