[Question] Dataset preprocessing

jik876 / hifi-gan

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

MIT License

1.98k stars 506 forks source link

[Question] Dataset preprocessing #24

Closed Kreevoz closed 1 year ago

Kreevoz commented 4 years ago

I've attempted to preprocess my dataset to meet the mel-spectrogram requirements but I either wind up with incorrectly packed spectrogram files, a wrong header, or wrong data. Don't think any of the tacotron2 implementations I can get my hands on will output the data in the required format, or I'm overlooking something obvious (which is equally likely 😌).

Could one of you helpful people provide a link to a working piece of code that takes care of this properly or could this repository be fleshed out more so that there is a working preprocessor for training datasets? 🤔

jik876 commented 4 years ago

Preprocessing for generating spectrograms from audio is implemented in meldataset.py. Posting details with the error log will be helpful to find a solution.

Kreevoz commented 4 years ago

Ah I should probably have been more specific. Also, thankyou for taking the time to respond, jik!

I was specifically curious about fine-tuning. The readme mentions that mel-spectrograms need to be generated with tacotron-2 with teacher-forcing - in the example provided there, they'd have been then placed in the ft_dataset folder.

That's where I'm not making progress. I took a look through the meldataset.py and it doesn't include any functions to interface with tacotron2 to facilitate the generation of the required mels. So how could that be done?

m-toman commented 4 years ago

Just arrived at the same question: the preprocessing in the nvidia taco2 repo is a bit different. Did you finetune on their pretrained LJ model or train a new one with your preprocessing?

jik876 commented 3 years ago

Ah I should probably have been more specific. Also, thankyou for taking the time to respond, jik!

I was specifically curious about fine-tuning. The readme mentions that mel-spectrograms need to be generated with tacotron-2 with teacher-forcing - in the example provided there, they'd have been then placed in the ft_dataset folder.

That's where I'm not making progress. I took a look through the meldataset.py and it doesn't include any functions to interface with tacotron2 to facilitate the generation of the required mels. So how could that be done?

@Vozeek

You can generate the spectrograms for fine tuning using the forward operation of Tacotron2. After saving spectrograms generated by Tacotron2 using numpy.save(), set the fine_tuning command line option and start training.

Kreevoz commented 3 years ago

Yeah, I get the rough idea of what you're saying. I don't understand Tacotron2 well enough to do that. 😌 I'm a novice looking into this stuff as a hobby, didn't study any of it professionally.

If anyone else reading this knows of a fork that has the necessary preprocessing integrated, let me know. That would be of tremendous help. In the meatime I'll stick to using waveglow, even though that doesn't sound nearly as nice and smooth as hifi-gan.