lucoiso / UEAzSpeech

This plugin integrates Azure Speech Cognitive Services in Unreal Engine.
https://forums.unrealengine.com/t/free-azspeech-plugin-async-text-to-voice-and-voice-to-text-with-microsoft-azure/495394
MIT License
194 stars 44 forks source link

viseme some time not received correctly #50

Closed eggcaker closed 1 year ago

eggcaker commented 1 year ago

with the latest version 1.2.5 , and using ssml to audio data to receive viseme ids and audio: image

when first time press B to run it's vismes are ok to received, 2, 3 not looks good

Same ssml to  audio data

1. feels good 

LogBlueprintUserMessages: [SpeechNew_C_0] 1
LogBlueprintUserMessages: [SpeechNew_C_0] 14
LogBlueprintUserMessages: [SpeechNew_C_0] 14
LogBlueprintUserMessages: [SpeechNew_C_0] 14
LogBlueprintUserMessages: [SpeechNew_C_0] 14
LogBlueprintUserMessages: [SpeechNew_C_0] 14
LogBlueprintUserMessages: [SpeechNew_C_0] 14
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 4
LogBlueprintUserMessages: [SpeechNew_C_0] 4
LogBlueprintUserMessages: [SpeechNew_C_0] 4
LogBlueprintUserMessages: [SpeechNew_C_0] 4
LogBlueprintUserMessages: [SpeechNew_C_0] 4
LogBlueprintUserMessages: [SpeechNew_C_0] 4
LogBlueprintUserMessages: [SpeechNew_C_0] 4
LogBlueprintUserMessages: [SpeechNew_C_0] 4
LogBlueprintUserMessages: [SpeechNew_C_0] 4
LogBlueprintUserMessages: [SpeechNew_C_0] 4
LogBlueprintUserMessages: [SpeechNew_C_0] 20
LogBlueprintUserMessages: [SpeechNew_C_0] 20
LogBlueprintUserMessages: [SpeechNew_C_0] 20
LogBlueprintUserMessages: [SpeechNew_C_0] 20
LogBlueprintUserMessages: [SpeechNew_C_0] 1
LogBlueprintUserMessages: [SpeechNew_C_0] 1
LogBlueprintUserMessages: [SpeechNew_C_0] 1
LogBlueprintUserMessages: [SpeechNew_C_0] 1
LogBlueprintUserMessages: [SpeechNew_C_0] 20
LogBlueprintUserMessages: [SpeechNew_C_0] 20
LogBlueprintUserMessages: [SpeechNew_C_0] 20
LogBlueprintUserMessages: [SpeechNew_C_0] 19
LogBlueprintUserMessages: [SpeechNew_C_0] 6
LogBlueprintUserMessages: [SpeechNew_C_0] 16
LogBlueprintUserMessages: [SpeechNew_C_0] 1
LogBlueprintUserMessages: [SpeechNew_C_0] 19
LogBlueprintUserMessages: [SpeechNew_C_0] 1
LogBlueprintUserMessages: [SpeechNew_C_0] 19
LogBlueprintUserMessages: [SpeechNew_C_0] 19
LogBlueprintUserMessages: [SpeechNew_C_0] 20
LogBlueprintUserMessages: [SpeechNew_C_0] 1
LogBlueprintUserMessages: [SpeechNew_C_0] 19
LogBlueprintUserMessages: [SpeechNew_C_0] 19
LogBlueprintUserMessages: [SpeechNew_C_0] 6
LogBlueprintUserMessages: [SpeechNew_C_0] 19
LogBlueprintUserMessages: [SpeechNew_C_0] 6
LogBlueprintUserMessages: [SpeechNew_C_0] 7
LogBlueprintUserMessages: [SpeechNew_C_0] 1
LogBlueprintUserMessages: [SpeechNew_C_0] 15
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 4
LogBlueprintUserMessages: [SpeechNew_C_0] 20
LogBlueprintUserMessages: [SpeechNew_C_0] 20
LogBlueprintUserMessages: [SpeechNew_C_0] 20
LogBlueprintUserMessages: [SpeechNew_C_0] 19
LogBlueprintUserMessages: [SpeechNew_C_0] 6
LogBlueprintUserMessages: [SpeechNew_C_0] 16
LogBlueprintUserMessages: [SpeechNew_C_0] 1
LogBlueprintUserMessages: [SpeechNew_C_0] 19
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 21
LogBlueprintUserMessages: [SpeechNew_C_0] 11
LogBlueprintUserMessages: [SpeechNew_C_0] 19
LogBlueprintUserMessages: [SpeechNew_C_0] 16
LogBlueprintUserMessages: [SpeechNew_C_0] 4
LogBlueprintUserMessages: [SpeechNew_C_0] 20
LogBlueprintUserMessages: [SpeechNew_C_0] 6
LogBlueprintUserMessages: [SpeechNew_C_0] 20
LogBlueprintUserMessages: [SpeechNew_C_0] 1
LogBlueprintUserMessages: [SpeechNew_C_0] 21
LogBlueprintUserMessages: [SpeechNew_C_0] 7
LogBlueprintUserMessages: [SpeechNew_C_0] 14
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0

2. not good

LogAzSpeech: Display: StartSynthesisWork: AzSpeech Task: SSMLToAudioData (147934); Starting synthesis
LogAzSpeech: Display: ProcessSynthesisResult: AzSpeech Task: SSMLToAudioData (147934); Task started. Reason: SynthesizingAudioStarted
LogBlueprintUserMessages: [SpeechNew_C_0] 20
LogBlueprintUserMessages: [SpeechNew_C_0] 20
LogBlueprintUserMessages: [SpeechNew_C_0] 20
LogBlueprintUserMessages: [SpeechNew_C_0] 20
LogBlueprintUserMessages: [SpeechNew_C_0] 20
LogBlueprintUserMessages: [SpeechNew_C_0] 20
LogBlueprintUserMessages: [SpeechNew_C_0] 20
LogBlueprintUserMessages: [SpeechNew_C_0] 20
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0

3. not good too

LogAzSpeech: Display: ProcessSynthesisResult: AzSpeech Task: SSMLToAudioData (147934); Task completed. Reason: SynthesizingAudioCompleted
LogAzSpeech: Display: OutputSynthesisResult: AzSpeech Task: SSMLToAudioData (147934); Task completed with result: Success
LogAzSpeech: Display: Activate: AzSpeech Task: SSMLToAudioData (149553); Activating
LogAzSpeech: Display: StartAzureTaskWork_Internal: AzSpeech Task: SSMLToAudioData (149553); Starting Azure SDK task
LogAzSpeech: Display: InitializeSynthesizer: AzSpeech Task: SSMLToAudioData (149553); Initializing synthesizer object
LogAzSpeech: Display: CreateSpeechConfig: AzSpeech Task: SSMLToAudioData (149553); Creating Azure SDK speech config
LogAzSpeech: Display: ApplySDKSettings: AzSpeech Task: SSMLToAudioData (149553); Applying Azure SDK Settings
LogAzSpeech: Display: EnableLogInConfiguration: AzSpeech Task: SSMLToAudioData (149553); Enabling Azure SDK log
LogAzSpeech: Display: ApplySDKSettings: AzSpeech Task: SSMLToAudioData (149553); Using language: en-US
LogAzSpeech: Display: ApplySDKSettings: AzSpeech Task: SSMLToAudioData (149553); Using voice: JennyNeural
LogAzSpeech: Display: ApplyExtraSettings: AzSpeech Task: SSMLToAudioData (149553); Adding extra settings to existing synthesizer object
LogAzSpeech: Display: EnableVisemeOutput: AzSpeech Task: SSMLToAudioData (149553); Enabling Viseme
LogAzSpeech: Display: StartSynthesisWork: AzSpeech Task: SSMLToAudioData (149553); Starting synthesis
LogAzSpeech: Display: ProcessSynthesisResult: AzSpeech Task: SSMLToAudioData (149553); Task started. Reason: SynthesizingAudioStarted
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 13
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
LogBlueprintUserMessages: [SpeechNew_C_0] 0
eggcaker commented 1 year ago

I'm not sure it's related this similar issue but for audio, I tried the latest fix , the visemes fixed , but for audio, it's has two aduio played, like half of previous audio and current audio mixed.