how to handle with the 48k wavs

yanglu1994 commented 2 years ago

if wavs sample rate is 48k, how to set the upsample parameters? when wavs sampler rate is 48k, the hop size is 600.the config in code is only upsampled 256.so it will run error when calculate the loss.

nampdn commented 2 years ago

I'm able to train and inference & in 44100Hz from this config:

{
    "resblock": "1",
    "num_gpus": 3,
    "batch_size": 8,
    "learning_rate": 0.0002,
    "adam_b1": 0.8,
    "adam_b2": 0.99,
    "lr_decay": 0.9995,
    "seed": 1234,

    "upsample_rates":        [ 8, 8, 2, 2, 2],
    "upsample_kernel_sizes": [16,16, 4, 4, 4],
    "upsample_initial_channel": 512,
    "resblock_kernel_sizes": [3,7,11],
    "resblock_dilation_sizes": [[1,3,5], [1,3,5], [1,3,5]],
    "discriminator_periods": [3, 5, 7, 11, 17, 23, 37],

    "segment_size": 16384,
    "num_mels": 80,
    "num_freq": 1025,
    "n_fft"   : 2048,
    "hop_size": 512,
    "win_size": 2048,

    "sampling_rate": 44100,

    "fmin": 20,
    "fmax": 11025,
    "fmax_for_loss": null,

    "num_workers": 4,

    "dist_config": {
        "dist_backend": "nccl",
        "dist_url": "tcp://localhost:54321",
        "world_size": 1
    }
}

Ref: https://github.com/CookiePPP/cookietts/blob/experimental/CookieTTS/_4_mtw/hifigan/config_v1_48Khz_multiGPU.json

Grace9994 commented 1 year ago

I'm not able to train with the hop size 512 when the sampling rate is 44100, but it works when the hop size is 441.

nickovchinnikov commented 5 months ago

@Grace9994 could you share your config? I can't find the segment_size, the output audio duration does not exactly match input audio

jik876 / hifi-gan

how to handle with the 48k wavs #116