kan-bayashi / ParallelWaveGAN

Unofficial Parallel WaveGAN (+ MelGAN & Multi-band MelGAN & HiFi-GAN & StyleMelGAN) with Pytorch
https://kan-bayashi.github.io/ParallelWaveGAN/
MIT License
1.56k stars 342 forks source link

Upload trained vocoder to huggingface #358

Closed roholazandie closed 2 years ago

roholazandie commented 2 years ago

Hi, I have trained a vocoder model using parallel_wavegan on my own corpus and wondering if I can upload it just like TTS models (tacotron, fastspeech, etc) to huggingface and then use the ```vocoder_tag''' to download and use it? I realized that you restricted the download to a predefined list of vocoders. Is there any possibility that it can be changed in the future?

kan-bayashi commented 2 years ago

That is a cool idea. I think it is not so difficult. Maybe we can extend the following function to accept tag_or_url. https://github.com/kan-bayashi/ParallelWaveGAN/blob/1f1f6750ea09ee416496eba79ec1a21dc20e5daf/parallel_wavegan/utils/utils.py#L362-L394

roholazandie commented 2 years ago

Yes, I think it's actually beneficial! For example, my model works much better with the vocoder I trained. I work on this and if it's going to work I send a pull request

kan-bayashi commented 2 years ago

added in #361 Thank you so much for your contribution, @roholazandie

malradhi commented 2 years ago

@roholazandie, thanks for sharing your trained model. Can you please tell me what changes you have made compared to the original parallel_wavegan?

roholazandie commented 2 years ago

@malradhi the only difference is how it downloads the vocoder model from any google drive link. you can see the changes here

samin9796 commented 2 years ago

Hi! I am getting error trying to upload a trained vocoder to HuggingFace. Could you please share how you did it?

Here is the part of the code.

`from parallel_wavegan.utils import download_pretrained_model

url = "https://drive.google.com/file/d/10GYvB_mIKzXzSjD67tSnBhknZRoBjsNb" download_path = download_pretrained_model(url)

gos_text2speech = Text2Speech.from_pretrained( model_tag="path/to/the/model", tag_or_url=download_path ) `

roholazandie commented 2 years ago

I am a bit confused on what you want to do. The code doesn't upload a vocoder it only downloads the pretrained vocoder and then uses it in Text2Speech. What are you going to do exactly?

samin9796 commented 2 years ago

I trained a FastSpeech2 TTS model and a vocoder on my custom data and then uploaded them to huggingface. Now I want to create a space on huggingface to deploy the TTS model. I can easily get the FastSpeech2 model and assign it to the model_tag variable but don't know how to get the vocoder. I uploaded the vocoder to the google drive as well and in the vocoder_tag, I assigned this "parallel_wavegan/link-to-google-drive". I am getting assertion_error: link does not exist error. How did you get the vocoder and assign it to the vocoder_tag variable?

roholazandie commented 2 years ago

For now you can't upload vocoder to huggingface and use it on the cloud. So it's not anything wrong on your side.

samin9796 commented 2 years ago

I see! I am working on a different language other than English and I hope I could use my own vocoder while deploying the TTS model on huggingface.

malradhi commented 2 years ago

@malradhi the only difference is how it downloads the vocoder model from any google drive link. you can see the changes here

Thank you, but the quality of the synthesized speech from your model is better than the original model.

roholazandie commented 2 years ago

@malradhi Actually i haven't done anything different from the original parallal_wavegan script for training, if you see the good quality is because of our high quality dataset (RyanSpeech) which is available on huggingface. The quality of the vocoders are very dependent on the quality of the recorded voices. Ours was done by a professional speaker in studio with no background noise.

malradhi commented 2 years ago

@malradhi Actually i haven't done anything different from the original parallal_wavegan script for training, if you see the good quality is because of our high quality dataset (RyanSpeech) which is available on huggingface. The quality of the vocoders are very dependent on the quality of the recorded voices. Ours was done by a professional speaker in studio with no background noise.

OK, great. I understand now, thank you.