lucoiso / UEAzSpeech

This plugin integrates Azure Speech Cognitive Services in Unreal Engine.
https://forums.unrealengine.com/t/free-azspeech-plugin-async-text-to-voice-and-voice-to-text-with-microsoft-azure/495394
MIT License
194 stars 44 forks source link

Fix Issue: "Sound Wave reported a duration of zero..." #148

Closed zixixr closed 1 year ago

zixixr commented 1 year ago

image

It seems that using SSML to Soundwav could have this error at a very high frequency. I've tried text-to-soundwav under the same network, no such issue. With this error, the sound would either not play at all or sometimes just play part of the speech.

LogAudio: Warning: Sound Wave reported a duration of zero. This will likely result in incorrect decoding.

zixixr commented 1 year ago

And I have tried to use both text-to-soundwav and ssml to soundwav at the same time, trying to get the sound and the viseme sepeartely, still there are many warnings like reported above.

lucoiso commented 1 year ago

@zixixr

sometimes the tasks were sending a false result before the real completion made a lot adjustments separating the update & complete signals

will test to check if there's possible regressions in the implementation i made before merging in the development branch, but you can already try the changes in the hotfix/SOUND-ZERO-DURAION-148 branch

zixixr commented 1 year ago

@zixixr

sometimes the tasks were sending a false result before the real completion made a lot adjustments separating the update & complete signals

will test to check if there's possible regressions in the implementation i made before merging in the development branch, but you can already try the changes in the hotfix/SOUND-ZERO-DURAION-148 branch

@lucoiso Thanks for the update. I've tried, there's no more 0 duration warnings. However, sometimes while I could get the visemes, there's no sound played at all.

zixixr commented 1 year ago

Did more tests, and the frequency of not playing any sound is quite high, almost like before, just without warnings

lucoiso commented 1 year ago

I was able to replicate the problem here. In the first tests, I was getting good results, but after a while, the audio was being generated without sound.

I am checking what could be causing it and I will send a correction soon on the same branch.

zixixr commented 1 year ago

Yes, that's exactly what I've experienced here. And I felt that generally the SSML to Soundwav is slower than the text to soundwav for sound generation.

lucoiso commented 1 year ago

@zixixr

hello! :)

I made some cleaning and changes. So far I haven't been able to replicate the issue anymore, but I will keep testing.

Regarding the difference in audio generation speed, I couldn't replicate it. At times the SSML was generated even faster, but there was no fixed rate. But I will perform more tests to try to locate the possible issue.

zixixr commented 1 year ago

Hi @lucoiso , I've tested the new plugin, if I just use the ssml node alone, it works fine with sound and vismes generated. However, my goal was to parse the visemes generated and then use the blendshape data to drive the facial expressions. However while I was able to drive the facial expression, usually there's no sound played ( tried to debug the blueprint, the play audio node was triggered, just no sound). I'll send you a google drive link of my project, could you please have a look? Thanks.

lucoiso commented 1 year ago

I have downloaded the project, took a quick look and was able to replicate the issue. However, I noticed that it only happens when the viseme is active. I will investigate what is happening and hope to bring a solution as soon as possible! :)

zixixr commented 1 year ago

Here's my finding regarding ssml node, If you just print the visemes, it seems okay. But after some moe complex handling of the strings(e.g for each loop)it stopped playing sound

lucoiso commented 1 year ago

hello @zixixr

i haven't yet figured out the exact cause of the problem, but i found a temporary solution that worked well:

instead of using the "Get Last Viseme Data" node in the output of "Viseme Received", use the "Get Viseme Data Array" node in "Synthesis Completed"

i'll continue investigating the cause of this issue 👀

lucoiso commented 1 year ago

@zixixr

I believe I made good progress now: I was able to reproduce the issue in the testing project I use here, and it stopped occurring after the last commit. I will test these changes again, but using your project. 👀

The issue was occurring frequently while trying to access viseme data in real-time while audio was being played. When this happened, the audio buffer was being sent incomplete. I believe the cause is incorrect management of the runnable thread, the lack of some mutex locks.

However, I noticed a performance loss on notifies, but I will continue working on it. :)

zixixr commented 1 year ago

Awesome! Look forward to it!

lucoiso commented 1 year ago

@zixixr

made some changes and sent new functions to extract the animation data from viseme data:

image

zixixr commented 1 year ago

@lucoiso Haha, cool, I'll have a try now. You must have noticed how cumbersome I was trying to parse the animation data from viseme data. Unreal's original json tool doesn't support nested json, so I had to do it that way. So I should use the "Get Viseme Data Array" node in "Synthesis Completed" for now?

zixixr commented 1 year ago

@lucoiso Yes, there's a significant performance issue. When the visemes data is processing, the whole project stuck, including the metahuman's animation. I tried the new node, not sure if it's related with the performance issue, the faical expression could not be correctly driven, it was way too fast.

lucoiso commented 1 year ago

@lucoiso Haha, cool, I'll have a try now. You must have noticed how cumbersome I was trying to parse the animation data from viseme data. Unreal's original json tool doesn't support nested json, so I had to do it that way. So I should use the "Get Viseme Data Array" node in "Synthesis Completed" for now?

Yes! :)

I'll create another issue to work in the performance drop that is occurring when try to get the viseme data while the task is active.

I'll perform more tests and merge the current changes to publish the 'duration of zero' fix