Closed Anarbb closed 1 week ago
As far as I know, using SSML and \ tags is the only way to get word-level timestamps with Google TTS. Currently, Standard, Wavenet, Neural2, News, and Casual voice types support both. At least for now, the new Journey voices do not support either.
You can, of course, remove the SSML and \ tags from the TalkingHead code, but this means you lose word-audio alignment information, which significantly decreases lip-sync accuracy.
Note also that the Journey voices are currently in the "Preview" stage. Hopefully Google will find a way to enable word timestamps for them before they become generally available (GA).
Added a list of currently supported Google voice types to the README's ttsVoice
option description.
The Google Journey models do not support SSML, pitch and rate, I got around this by just removing the SSML and replacing it by the text and removing the pitch and rate options.