We need to start considering multispeaker transitions.
One step here would be tagging a sentences with an identity that would track what synthesis model is being used (e.g. en-US-wavenet-A, en-US-wavenet-B or AWS Dave). Will allow for manual implementation of multispeaker videos before we start working with people.
We need to start considering multispeaker transitions. One step here would be tagging a sentences with an identity that would track what synthesis model is being used (e.g. en-US-wavenet-A, en-US-wavenet-B or AWS Dave). Will allow for manual implementation of multispeaker videos before we start working with people.