Closed ajinkyakulkarni14 closed 2 years ago
@ajinkyakulkarni14 Hey! Sorry for the late response. Multispeaker is possible for Grad-TTS and we are discussing the opportunity for releasing it also (we verified it on Libri-TTS). Actually, if you don't want to wait, you can modify the code by yourself by introducing additional condition to the model with classical learnable speaker embeddings. The only thing you should notice is that encoder has the loss on mel-spectrogram also, thus you should condition the encoder on speaker embedding as well as the decoder. Condition can be made by simple broadcasting of speaker embedding along all timesteps and channel-wise concatenation with the other input. In our solution, we conditioned both encoder and decoder.
Hello @ivanvovk . I have added the speaker encoder module similar way of GLOW-TTS implementation. Thank you for suggestion.
Thank you for the releasing original implementation of Grad-TTS. I would like to know if a multispeaker setting is available or planned for release.
I am implementing a multispeaker setting using this repo. Will the maintainer of this repo be interested in discussing or providing feedback on multispeaker Grad-TTS implementation?
Regards Ajinkya