huawei-noah / Speech-Backbones

This is the main repository of open-sourced speech technology by Huawei Noah's Ark Lab.
545 stars 113 forks source link

Grad-TTS in multispeaker setting #5

Closed ajinkyakulkarni14 closed 2 years ago

ajinkyakulkarni14 commented 3 years ago

Thank you for the releasing original implementation of Grad-TTS. I would like to know if a multispeaker setting is available or planned for release.

I am implementing a multispeaker setting using this repo. Will the maintainer of this repo be interested in discussing or providing feedback on multispeaker Grad-TTS implementation?

Regards Ajinkya

ivanvovk commented 2 years ago

@ajinkyakulkarni14 Hey! Sorry for the late response. Multispeaker is possible for Grad-TTS and we are discussing the opportunity for releasing it also (we verified it on Libri-TTS). Actually, if you don't want to wait, you can modify the code by yourself by introducing additional condition to the model with classical learnable speaker embeddings. The only thing you should notice is that encoder has the loss on mel-spectrogram also, thus you should condition the encoder on speaker embedding as well as the decoder. Condition can be made by simple broadcasting of speaker embedding along all timesteps and channel-wise concatenation with the other input. In our solution, we conditioned both encoder and decoder.

ajinkyakulkarni14 commented 2 years ago

Hello @ivanvovk . I have added the speaker encoder module similar way of GLOW-TTS implementation. Thank you for suggestion.