Closed tanerdogan closed 3 months ago
Open your Chrome console by hitting F12 and try again -- please paste the error into a reply here, if any. Cheers!
Hi again, there is no error at all... maybe speakText method not working like that?
console.log(head.getMoodNames());
console.log(head.getMorphTargetNames());
head.speakText("Hello how are you today");
head.startSpeaking();
If you want the avatar to speak and lip-sync some text, simply calling the speakText
method should work. There is no need to use the lip-sync module directly or call other methods.
Have you tried the minimal code example? - If you haven't, download the ./examples/minimal.html, add your own Google TTS API key, and test.
@met4citizen thanks for reply. I think i should explain more because i am not using Google TTS, I have my own backend which is work with OpenAI PHP and using that for chat, TTS and STT APIs. And using HowlerJS for audio...
So I create TalkingHead like that
head = new TalkingHead( nodeAvatar, {
ttsEndpoint: "no.json",
ttsApikey: "", // <- Change this
cameraView: "upper"
});
and my no.json like this,
{ "words": "HELLO, HOW ARE YOU TODAY?", "visemes": [ "I", "E", "nn", "O", "I", "aa", "RR", "I", "U", "DD", "aa", "DD", "E" ], "times": [ 0, 0.92, 1.82, 2.7, 7.66, 8.58, 11.194999999999999, 13.075, 13.995, 15.944999999999999, 16.994999999999997, 17.944999999999997, 18.994999999999997 ], "durations": [ 0.92, 0.9, 0.88, 0.96, 0.92, 1.6149999999999998, 0.88, 0.92, 0.95, 1.05, 0.95, 1.05, 0.9 ], "i": 25 }
So for my project i just need startSpeak (random words no need lip sync) with Audio onStart and stopSpeak w audio onEnded.
BTW -- tested mp3.html too its works perfect and i copied response json and paste into no.json but still no luck.
Maybe I'm asking stupid question but still learning and trying to put the pieces together -)
Thanks, now I understand a bit more what you are trying to do. Since speakText
method always uses Google TTS, the best starting point here is the mp3.html
code example. It uses speakAudio
method and requires no TTS.
Here is a simple example that replaces the "load" button click event handler with a "manual" call to speakAudio
. Instead of silence you can, of course, use some actual audio content. Times and durations are specified in milliseconds:
// Load button clicked
const nodeLoad = document.getElementById('load');
nodeLoad.addEventListener('click', function () {
// Create an empty audio buffer of length 1 seconds
const audioCtx = new AudioContext();
const audiobuffer = audioCtx.createBuffer(2, 22050, 22050);
// Speak audio
head.speakAudio({
audio: audiobuffer,
words: ["HELLO,","HOW","ARE","YOU","TODAY?"],
wtimes: [20,500,520,640,740],
wdurations: [320,20,110,100,240],
markers: [],
mtimes: []
});
});
I hope this helps. Feel free to ask further questions, if any.
@met4citizen ♥️ Thanks for help, now i had progress and wanna share here. Just wonder why wordsToVisemes
method response 22 times & durations value? Must be 7 for wordsToVisemes('Hello, how are you today my darling?');
and values are much smaller. Can I use any other method instead of wordsToVisemes
?
const audioCtx = new AudioContext();
const audiobuffer = audioCtx.createBuffer(2, 22050, 22050);
var vv = new LipsyncEn();
var vis = vv.wordsToVisemes('Hello, how are you today my darling?');
var wo = vis.words.split(" ");
var wt = vis.times.map(i => i * 400);
var wd = vis.durations.map(i => i * 800);
head.speakAudio({
audio: audiobuffer,
words: wo,
wtimes: wt,
wdurations: wd,
markers: [],
mtimes: []
});
Thanks again...
The typical use case for speakAudio
is that you get both the audio and word timestamps from some external TTS engine, such as ElevenLabs or Microsoft Speech SDK. In an alternative use case, you already have some audio recording and you call some kind of transcription service to get the word timestamps for that audio file. In both cases, you finally call speakAudio
, and the class internally breaks each word in the array into visemes and viseme timestamps by using the lip-sync module's wordsToVisemes
method.
You don't need to call the method wordsToVisemes
. It doesn't return word timestamps, it gives visemes and viseme timestamps for words. The timestamps it gives are in relative units, not in seconds or milliseconds.
Now, if you don't plan to use any audio file(?) and only want to make the lips move, you can just estimate the full duration of the sentence and do the following:
// Load button clicked
const nodeLoad = document.getElementById('load');
nodeLoad.addEventListener('click', function () {
// Estimate duration of the sentence in seconds
let duration = 1.0;
// Create an empty audio buffer of appropriate length
const audioCtx = new AudioContext();
const audiobuffer = audioCtx.createBuffer(2, Math.round( duration * 22050), 22050);
// Speak audio
head.speakAudio({
audio: audiobuffer,
words: ["Hello, how are you today?"],
wtimes: [0],
wdurations: [duration * 1000],
markers: [],
mtimes: []
});
});
So, instead of multiple words, you give the class the full sentence as a single multi-part word. - I was going to add that the resulting lip-sync will not be very accurate this way, but if you don't have any audio file then how can you tell... 🙂
Hello i am using audio (tts) also audio record (stt) too. Maybe i cant figure out how to do with my limited frontend knowledge... Now its works as i want for now. Fake speak starts w audio play w setInterval and clearInterval when onend...
Thanks everyone again...
https://github.com/met4citizen/TalkingHead/assets/634890/b0d60e8e-254e-43d2-bb0c-2332ea6f1545
Visually, that looks great - I would love to visit that place!
The lip movement is a bit extreme and the timing isn't right, but that can be fixed. - Maybe you can find some local JavaScript expert to help?
I will close this issue for now because we have drifted away from the original title/topic, but feel free to reopen or create a new issue if needed. Best of luck!
Hi, thanks for great project first. I've a silly question. I just want a lipsync (actually just need a speaking animation no need sync for now) tried
head.speakText("Hello how are you today");
but no luck.Also tried like that
I am not JS dev, is there any solution? Thanks