maum-ai / assem-vc

Official Code for Assem-VC @ICASSP2022
https://mindslab-ai.github.io/assem-vc/
BSD 3-Clause "New" or "Revised" License
265 stars 38 forks source link

Pre-trained model #17

Closed jJohnny342 closed 3 years ago

jJohnny342 commented 3 years ago

Could you please tell us when you are planning to release pre-trained model? Is it possible for you to provide us some kind of loss graph or just the number of training steps necessary for each module to converge on LibriTTS+VCTK dataset? So we could estimate whether it is possible for mere mortals to train the model without multiple advanced GPUs... Could you elaborate on audio normalization mentioned in you paper? Is it implemented somewhere in your project or should we process audio files by some other means? Thank you!

wookladin commented 3 years ago

Hi! We just released pre-trained weights. Please check out!

Unfortunately, our cotatron loss graph has been deleted, so it is difficult to upload. However, we can provide a loss graph of the synthesizer(VC decoder). Please see the graph below. image

FYI, we trained cotatron 25k steps only with LibriTTS, and 20k steps with LibriTTS and VCTK. The scale of cotatron validation's reconstruction loss was about 0.28.

Lastly, we implemented audio normalization in our project. You can use normalization by changing the norm option in https://github.com/mindslab-ai/assem-vc/blob/master/datasets/text_mel_dataset.py#L17 to True. To use audio normalization in the training process, make sure that the norm option of cotatron.py and synthesizer.py must also be set to True! https://github.com/mindslab-ai/assem-vc/blob/master/cotatron.py#L136

Since we observed that normalization does not affect the output result, we set the norm option to False in this implementation. Thanks!