Solution: We can train it further to change the voice that is used. Load the pytorch model and apply some gradient steps on a new input distribution (aka transfer learning).
Desired input: Pretrained model, training corpus for supervised learning (not sure what this entails exactly, see https://github.com/NVIDIA/tacotron2 and https://arxiv.org/abs/1712.05884). This may require us to create a dataset for training. A new issue will be posted with further details. I'm not sure what labels and audio features are required for their model.
Desired output: Trained model after some number of training epochs.
Issue: Tacotron 2 only has pretrained voice.
Solution: We can train it further to change the voice that is used. Load the pytorch model and apply some gradient steps on a new input distribution (aka transfer learning).
Desired input: Pretrained model, training corpus for supervised learning (not sure what this entails exactly, see https://github.com/NVIDIA/tacotron2 and https://arxiv.org/abs/1712.05884). This may require us to create a dataset for training. A new issue will be posted with further details. I'm not sure what labels and audio features are required for their model.
Desired output: Trained model after some number of training epochs.