jik876 / hifi-gan

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
MIT License
1.92k stars 506 forks source link

how to handle with the 48k wavs #116

Open yanglu1994 opened 2 years ago

yanglu1994 commented 2 years ago

if wavs sample rate is 48k, how to set the upsample parameters? when wavs sampler rate is 48k, the hop size is 600.the config in code is only upsampled 256.so it will run error when calculate the loss.

nampdn commented 2 years ago

I'm able to train and inference & in 44100Hz from this config:

{
    "resblock": "1",
    "num_gpus": 3,
    "batch_size": 8,
    "learning_rate": 0.0002,
    "adam_b1": 0.8,
    "adam_b2": 0.99,
    "lr_decay": 0.9995,
    "seed": 1234,

    "upsample_rates":        [ 8, 8, 2, 2, 2],
    "upsample_kernel_sizes": [16,16, 4, 4, 4],
    "upsample_initial_channel": 512,
    "resblock_kernel_sizes": [3,7,11],
    "resblock_dilation_sizes": [[1,3,5], [1,3,5], [1,3,5]],
    "discriminator_periods": [3, 5, 7, 11, 17, 23, 37],

    "segment_size": 16384,
    "num_mels": 80,
    "num_freq": 1025,
    "n_fft"   : 2048,
    "hop_size": 512,
    "win_size": 2048,

    "sampling_rate": 44100,

    "fmin": 20,
    "fmax": 11025,
    "fmax_for_loss": null,

    "num_workers": 4,

    "dist_config": {
        "dist_backend": "nccl",
        "dist_url": "tcp://localhost:54321",
        "world_size": 1
    }
}

Ref: https://github.com/CookiePPP/cookietts/blob/experimental/CookieTTS/_4_mtw/hifigan/config_v1_48Khz_multiGPU.json

Grace9994 commented 1 year ago

I'm not able to train with the hop size 512 when the sampling rate is 44100, but it works when the hop size is 441.

nickovchinnikov commented 5 months ago

@Grace9994 could you share your config? I can't find the segment_size, the output audio duration does not exactly match input audio