jaywalnut310 / vits

VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
https://jaywalnut310.github.io/vits-demo/index.html
MIT License
6.48k stars 1.21k forks source link

Are there any bug in voice_conversion (reverse True or False for source and target) ?? #148

Open yt605155624 opened 1 year ago

yt605155624 commented 1 year ago

https://github.com/jaywalnut310/vits/blob/2e561ba58618d021b5b8323d3765880f7e0ecfdb/models.py#L525

in Glow-TTS Appendix B3, you said that

  1. inverse pass for source speaker
  2. forward pass for target speaker image

so I think reverse should be True in https://github.com/jaywalnut310/vits/blob/2e561ba58618d021b5b8323d3765880f7e0ecfdb/models.py#L530 and should be False (by default) in https://github.com/jaywalnut310/vits/blob/2e561ba58618d021b5b8323d3765880f7e0ecfdb/models.py#L531

yt605155624 commented 1 year ago

oh, I have found that the code matches Supplementary Material D in VITS , so there are some bug in Glow-TTS Appendix B3

image