met4citizen / TalkingHead

Talking Head (3D): A JavaScript class for real-time lip-sync using Ready Player Me full-body 3D avatars.
MIT License
349 stars 107 forks source link

Google's Journey TTS model not compatible. #69

Closed Anarbb closed 1 week ago

Anarbb commented 2 weeks ago

The Google Journey models do not support SSML, pitch and rate, I got around this by just removing the SSML and replacing it by the text and removing the pitch and rate options.

met4citizen commented 2 weeks ago

As far as I know, using SSML and \ tags is the only way to get word-level timestamps with Google TTS. Currently, Standard, Wavenet, Neural2, News, and Casual voice types support both. At least for now, the new Journey voices do not support either.

You can, of course, remove the SSML and \ tags from the TalkingHead code, but this means you lose word-audio alignment information, which significantly decreases lip-sync accuracy.

Note also that the Journey voices are currently in the "Preview" stage. Hopefully Google will find a way to enable word timestamps for them before they become generally available (GA).

met4citizen commented 1 week ago

Added a list of currently supported Google voice types to the README's ttsVoice option description.