Open leandro-gracia-gil opened 2 years ago
Do you know the contents of the npy files be now? I want to get the npy files through librosa too . And do you realize your idea now?
@leandro-gracia-gil Did you figure out how the mel files should look like? There is not much detail into it.
@leandro-gracia-gil Did you figure out how the mel files should look like? There is not much detail into it.
No, at the end I changed meldataset.py locally to generate mel spectrograms using torchaudio, directly from the audio as it is loaded.
I'd like to train hifi-gan on a custom dataset with its own set of wav files. For this I need to generate the corresponding mel spectrograms, which the readme says can be done using Tacotron2 although it does not give much detail into it.
However, generating mel spectrograms should be pretty straightforward as it just involves a stft, a complex magnitude, mel banks you can get from librosa, and possibly a log operation with some minimum value. You just need to make sure to use the same settings as specified in the config file.
I'd like to use my own scripts to generate these mel spectrograms without having to deal with the Tacotron2 code if possible, but I need to know what the contents of the npy files should be. Are them just a numpy arrays of shape [num_frames, mel_bins]? Are the values in log scale (log mel spectrograms)?
Thanks.