Closed BangjianZhou closed 1 year ago
https://github.com/p0p4k/vits2_pytorch/blob/c4fb23c06fadf8a8fc49b57a0aa7ebdfe744bb0f/models.py#L847
Hi, just wonder this line. Seems that though the TextEncoder conditioned on speaker embedding on the 3rh layer, but the g is not feed into the TextEncoder? Did I misunderstand sth?
Thanks a lot for catching it, I fixed it in latest patch! Was too focused on single speaker training that I forgot to enable that part.
https://github.com/p0p4k/vits2_pytorch/blob/c4fb23c06fadf8a8fc49b57a0aa7ebdfe744bb0f/models.py#L847
Hi, just wonder this line. Seems that though the TextEncoder conditioned on speaker embedding on the 3rh layer, but the g is not feed into the TextEncoder? Did I misunderstand sth?