Does the speaker embedding conditioning work?

p0p4k / vits2_pytorch

unofficial vits2-TTS implementation in pytorch

https://arxiv.org/abs/2307.16430

MIT License

471 stars 84 forks source link

Does the speaker embedding conditioning work? #15

Closed BangjianZhou closed 1 year ago

BangjianZhou commented 1 year ago

https://github.com/p0p4k/vits2_pytorch/blob/c4fb23c06fadf8a8fc49b57a0aa7ebdfe744bb0f/models.py#L847

Hi, just wonder this line. Seems that though the TextEncoder conditioned on speaker embedding on the 3rh layer, but the g is not feed into the TextEncoder? Did I misunderstand sth?

p0p4k commented 1 year ago

Thanks a lot for catching it, I fixed it in latest patch! Was too focused on single speaker training that I forgot to enable that part.