Hifigan Training error in 16k

IrisLhy commented 2 years ago

Thank you for your contribution on the vocoder. I train the pwg and hifigan on my dataset, pwg works well however the hifigan have problem.

  File "/home/bme2/miniconda3/envs/silent/bin/parallel-wavegan-train", line 33, in <module>
    sys.exit(load_entry_point('parallel-wavegan', 'console_scripts', 'parallel-wavegan-train')())
  File "/data/data-lhy/Vocoders/PWG/ParallelWaveGAN/parallel_wavegan/bin/train.py", line 1068, in main
    trainer.run()
  File "/data/data-lhy/Vocoders/PWG/ParallelWaveGAN/parallel_wavegan/bin/train.py", line 96, in run
    self._train_epoch()
  File "/data/data-lhy/Vocoders/PWG/ParallelWaveGAN/parallel_wavegan/bin/train.py", line 302, in _train_epoch
    self._train_step(batch)
  File "/data/data-lhy/Vocoders/PWG/ParallelWaveGAN/parallel_wavegan/bin/train.py", line 223, in _train_step
    mel_loss = self.criterion["mel"](y_, y)
  File "/home/bme2/miniconda3/envs/silent/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/data/data-lhy/Vocoders/PWG/ParallelWaveGAN/parallel_wavegan/losses/mel_loss.py", line 164, in forward
    mel_loss = F.l1_loss(mel_hat, mel)
  File "/home/bme2/miniconda3/envs/silent/lib/python3.8/site-packages/torch/nn/functional.py", line 2633, in l1_loss
    expanded_input, expanded_target = torch.broadcast_tensors(input, target)
  File "/home/bme2/miniconda3/envs/silent/lib/python3.8/site-packages/torch/functional.py", line 71, in broadcast_tensors
    return _VF.broadcast_tensors(tensors)  # type: ignore
RuntimeError: The size of tensor a (76) must match the size of tensor b (65) at non-singleton dimension 2

Here is my changed parameter about hifigan. The other is the same as https://github.com/kan-bayashi/ParallelWaveGAN/blob/master/egs/csmsc/voc1/conf/hifigan.v1.yaml

###########################################################
#                FEATURE EXTRACTION SETTING               #
###########################################################
sampling_rate: 16000     # Sampling rate.
fft_size: 1024           # FFT size.
hop_size: 256            # Hop size.
win_length: 1024         # Window length.
                         # If set to null, it will be the same as fft_size.
window: "hann"           # Window function.
num_mels: 80             # Number of mel basis.
fmin: 80                 # Minimum freq in mel basis calculation.
fmax: 7600               # Maximum frequency in mel basis calculation.
global_gain_scale: 1.0   # Will be multiplied to all of waveform.
trim_silence: false      # Whether to trim the start and end of silence.
trim_threshold_in_db: 20 # Need to tune carefully if the recording is not good.
trim_frame_size: 1024    # Frame size in trimming.
trim_hop_size: 256       # Hop size in trimming.
format: "hdf5"           # Feature file format. "npy" or "hdf5" is supported.

###########################################################
#                  DATA LOADER SETTING                    #
###########################################################
batch_size: 8              # Batch size.
batch_max_steps: 16384       # Length of each audio in batch. Make sure dividable by hop_size.
pin_memory: true            # Whether to pin memory in Pytorch DataLoader.
num_workers: 2              # Number of workers in Pytorch DataLoader.
remove_short_samples: false # Whether to remove samples the length of which are less than batch_max_steps.
allow_cache: false          # Whether to allow cache in dataset. If true, it requires cpu memory.

###########################################################
#                   STFT LOSS SETTING                     #
###########################################################
use_stft_loss: false                 # Whether to use multi-resolution STFT loss.
use_mel_loss: true                   # Whether to use Mel-spectrogram loss.
mel_loss_params:
    fs: 16000
    fft_size: 1024
    hop_size: 256
    win_length: 1024
    window: "hann"
    num_mels: 80
    fmin: 0
    fmax: 8000
    log_base: null

kan-bayashi commented 2 years ago

You need to modify the following parameters to match with hop size. https://github.com/kan-bayashi/ParallelWaveGAN/blob/1df88a3d95fd7a8294f3147578506051a4eb85e9/egs/csmsc/voc1/conf/hifigan.v1.yaml?rgh-link-date=2022-04-05T13%3A14%3A47Z#L40-L41 csmsc uses hop_size=300 so 554*3 = 300.

IrisLhy commented 2 years ago

You need to modify the following parameters to match with hop size.

https://github.com/kan-bayashi/ParallelWaveGAN/blob/1df88a3d95fd7a8294f3147578506051a4eb85e9/egs/csmsc/voc1/conf/hifigan.v1.yaml?rgh-link-date=2022-04-05T13%3A14%3A47Z#L40-L41

csmsc uses hop_size=300 so 5_5_4*3 = 300.

Thank you very much！

kan-bayashi / ParallelWaveGAN

Hifigan Training error in 16k #351