lucoiso / UEAzSpeech

This plugin integrates Azure Speech Cognitive Services in Unreal Engine.
https://forums.unrealengine.com/t/free-azspeech-plugin-async-text-to-voice-and-voice-to-text-with-microsoft-azure/495394
MIT License
198 stars 47 forks source link

Can I get viseme animation data? #235

Open metakkh opened 1 year ago

metakkh commented 1 year ago

Hi, Can I get viseme data for 3D characters facial animation?

https://learn.microsoft.com/en-us/azure/ai-services/speech-service/how-to-speech-synthesis-viseme?tabs=3dblendshapes&pivots=programming-language-cpp#3d-blend-shapes-animation

image

I checked the viseme received value here, I confirmed that the other values ​​were received correctly.

image

But, Viseme Data Aniamtion value is empty. Are there any other settings to get that value?

Also, if you know how to connect the value to the metahuman's blendshape, I would appreciate your help.

skysworder commented 1 year ago

you can use SSML to soundwave instead of text to soundwave,and set the viseme type as "FacialExpression" in your SSML string. below is a valid SSML example to get blendshape data.

Rainbow has seven colors: Red, orange, yellow, green, blue, indigo, and violet.

anyway, it might not be a good idea to drive a lipsync animation by 55 blendshapes unless your GPU is strong enough, I can't get any acceptable performance in my laptop(rtx3070,8G),so I have to give up and switch to using visemeID.

lucoiso commented 1 year ago

As skyworder said, to get the blendshapes, you'll need a SSML data with the mstts:viseme input 😁

https://learn.microsoft.com/en-us/azure/ai-services/speech-service/speech-synthesis-markup-structure#viseme-element

JiangHaiWei commented 1 year ago

Following your guidance, I obtained the blendshape data, but how can I use this data to enable metahuman to implement lipsync? Hope to get some guidance.

skysworder commented 1 year ago

Following your guidance, I obtained the blendshape data, but how can I use this data to enable metahuman to implement lipsync? Hope to get some guidance.

I've tried serval times to drive lipsync by using multi blend pose node, but not work at all. so I shifted to using viseme ID and offset time, it works well. here's main idea:

  1. you need to build a pose asset compare with azure viseme ID(22 poses), let's name it as "az_viseme_poseAsset",this is easy because metahuman has a PoseLibrary for visemes under common/common folder, you can pick out what you need carefully.
  2. Define a enumeration to list those 22 visemes,let's call it as azVisemeID.
  3. Add blend pose(azVisemeID) in the anim graph which drive face animation,and don't forget add all pins as blend channels.
  4. Add Evaluate pose az_viseme_poseAsset in the same graph with step 3,and convert this node to Pose by Name,Duplicate this node by 21 times to match the count of visemeID.Modify pose name for each of them,make sure the names should available in az_viseme_poseAsset.
  5. Now you can connect each viseme pose to blend pose(azVisemeID) by same order.
  6. Upadate the Active enum value of blend pose(azVisemeID) in Event Tick of level blueprint.
Ale3274 commented 6 months ago

Can you give detail explanation i am also trying on that

skysworder commented 6 months ago

here's an example blueprint animgraph(in face_animBP of your metahuman),notice I use a viseme pose asset(face_visemes_lib_PoseAsset) with Oculus OVRlips naming-convention instead of use Azure viseme ID number.For blend Poses, You need to create an enumerate data with Azure viseme ID naming-convention(0-21) and name it as azVisemeID or any others at first,otherwise you won't find 'blend poses(azVisemeID)' node in 17151347371228 here's a fragment of level blueprint for event begin,to get the visemeID data and offset time. 17151347869035 here's the graph for event ticks in level blueprint,to set the active viseme ID on offset time. 17151348301498