met4citizen / TalkingHead

Talking Head (3D): A JavaScript class for real-time lip-sync using Ready Player Me full-body 3D avatars.
MIT License
296 stars 95 forks source link

Possibility for using custom audio #55

Closed christo-zero-john closed 1 month ago

christo-zero-john commented 1 month ago

Is there any way to use custom audio (For example auio in my system) instead of using google tts or others

met4citizen commented 1 month ago

Yes, you can use the speakAudio method instead of speakText. However, in addition to the audio, you'll need to provide the words and their timestamps for accurate lip-sync. One way to obtain these is by using some transcription service. For an example, refer to the mp3.html app in the examples directory.

christo-zero-john commented 1 month ago

Yes, you can use the speakAudio method instead of speakText. However, in addition to the audio, you'll need to provide the words and their timestamps for accurate lip-sync. One way to obtain these is by using some transcription service. For an example, refer to the mp3.html app in the examples directory.

Really! Thanks for the help and this library tooo. It was a big help for my project

christo-zero-john commented 1 month ago

Hi So can I use it without google tts api key. I want to use a hugging face model to convert text to speech and animate the face accordingly. How should I do that. I am using facebook/fastspeech2-en-ljspeech model via hf inference api

met4citizen commented 1 month ago

If you only use the speakAudio method, you don't need a Google TTS API key. The Google TTS API is only required when using the speakText method.

If the Hugging Face TTS service you are using provides word timestamps, you can simply use the speakAudio method. If it doesn't, you can either switch to a TTS service that does (such as Google, Microsoft, ElevenLabs, etc.) or use some transcription service (like OpenAI's Whisper) to extract word timestamps from the audio.

christo-zero-john commented 1 month ago

Okay. Thanks for the help.