jik876 / hifi-gan

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
MIT License
1.97k stars 507 forks source link

Mel spectrogram npy contents #134

Open leandro-gracia-gil opened 2 years ago

leandro-gracia-gil commented 2 years ago

I'd like to train hifi-gan on a custom dataset with its own set of wav files. For this I need to generate the corresponding mel spectrograms, which the readme says can be done using Tacotron2 although it does not give much detail into it.

However, generating mel spectrograms should be pretty straightforward as it just involves a stft, a complex magnitude, mel banks you can get from librosa, and possibly a log operation with some minimum value. You just need to make sure to use the same settings as specified in the config file.

I'd like to use my own scripts to generate these mel spectrograms without having to deal with the Tacotron2 code if possible, but I need to know what the contents of the npy files should be. Are them just a numpy arrays of shape [num_frames, mel_bins]? Are the values in log scale (log mel spectrograms)?

Thanks.

a897456 commented 1 year ago

Do you know the contents of the npy files be now? I want to get the npy files through librosa too . And do you realize your idea now?

samin9796 commented 6 months ago

@leandro-gracia-gil Did you figure out how the mel files should look like? There is not much detail into it.

leandro-gracia-gil commented 6 months ago

@leandro-gracia-gil Did you figure out how the mel files should look like? There is not much detail into it.

No, at the end I changed meldataset.py locally to generate mel spectrograms using torchaudio, directly from the audio as it is loaded.