Closed eggcaker closed 1 year ago
with the latest version 1.2.5 , and using ssml to audio data to receive viseme ids and audio:
when first time press B to run it's vismes are ok to received, 2, 3 not looks good
Same ssml to audio data 1. feels good LogBlueprintUserMessages: [SpeechNew_C_0] 1 LogBlueprintUserMessages: [SpeechNew_C_0] 14 LogBlueprintUserMessages: [SpeechNew_C_0] 14 LogBlueprintUserMessages: [SpeechNew_C_0] 14 LogBlueprintUserMessages: [SpeechNew_C_0] 14 LogBlueprintUserMessages: [SpeechNew_C_0] 14 LogBlueprintUserMessages: [SpeechNew_C_0] 14 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 4 LogBlueprintUserMessages: [SpeechNew_C_0] 4 LogBlueprintUserMessages: [SpeechNew_C_0] 4 LogBlueprintUserMessages: [SpeechNew_C_0] 4 LogBlueprintUserMessages: [SpeechNew_C_0] 4 LogBlueprintUserMessages: [SpeechNew_C_0] 4 LogBlueprintUserMessages: [SpeechNew_C_0] 4 LogBlueprintUserMessages: [SpeechNew_C_0] 4 LogBlueprintUserMessages: [SpeechNew_C_0] 4 LogBlueprintUserMessages: [SpeechNew_C_0] 4 LogBlueprintUserMessages: [SpeechNew_C_0] 20 LogBlueprintUserMessages: [SpeechNew_C_0] 20 LogBlueprintUserMessages: [SpeechNew_C_0] 20 LogBlueprintUserMessages: [SpeechNew_C_0] 20 LogBlueprintUserMessages: [SpeechNew_C_0] 1 LogBlueprintUserMessages: [SpeechNew_C_0] 1 LogBlueprintUserMessages: [SpeechNew_C_0] 1 LogBlueprintUserMessages: [SpeechNew_C_0] 1 LogBlueprintUserMessages: [SpeechNew_C_0] 20 LogBlueprintUserMessages: [SpeechNew_C_0] 20 LogBlueprintUserMessages: [SpeechNew_C_0] 20 LogBlueprintUserMessages: [SpeechNew_C_0] 19 LogBlueprintUserMessages: [SpeechNew_C_0] 6 LogBlueprintUserMessages: [SpeechNew_C_0] 16 LogBlueprintUserMessages: [SpeechNew_C_0] 1 LogBlueprintUserMessages: [SpeechNew_C_0] 19 LogBlueprintUserMessages: [SpeechNew_C_0] 1 LogBlueprintUserMessages: [SpeechNew_C_0] 19 LogBlueprintUserMessages: [SpeechNew_C_0] 19 LogBlueprintUserMessages: [SpeechNew_C_0] 20 LogBlueprintUserMessages: [SpeechNew_C_0] 1 LogBlueprintUserMessages: [SpeechNew_C_0] 19 LogBlueprintUserMessages: [SpeechNew_C_0] 19 LogBlueprintUserMessages: [SpeechNew_C_0] 6 LogBlueprintUserMessages: [SpeechNew_C_0] 19 LogBlueprintUserMessages: [SpeechNew_C_0] 6 LogBlueprintUserMessages: [SpeechNew_C_0] 7 LogBlueprintUserMessages: [SpeechNew_C_0] 1 LogBlueprintUserMessages: [SpeechNew_C_0] 15 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 4 LogBlueprintUserMessages: [SpeechNew_C_0] 20 LogBlueprintUserMessages: [SpeechNew_C_0] 20 LogBlueprintUserMessages: [SpeechNew_C_0] 20 LogBlueprintUserMessages: [SpeechNew_C_0] 19 LogBlueprintUserMessages: [SpeechNew_C_0] 6 LogBlueprintUserMessages: [SpeechNew_C_0] 16 LogBlueprintUserMessages: [SpeechNew_C_0] 1 LogBlueprintUserMessages: [SpeechNew_C_0] 19 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 21 LogBlueprintUserMessages: [SpeechNew_C_0] 11 LogBlueprintUserMessages: [SpeechNew_C_0] 19 LogBlueprintUserMessages: [SpeechNew_C_0] 16 LogBlueprintUserMessages: [SpeechNew_C_0] 4 LogBlueprintUserMessages: [SpeechNew_C_0] 20 LogBlueprintUserMessages: [SpeechNew_C_0] 6 LogBlueprintUserMessages: [SpeechNew_C_0] 20 LogBlueprintUserMessages: [SpeechNew_C_0] 1 LogBlueprintUserMessages: [SpeechNew_C_0] 21 LogBlueprintUserMessages: [SpeechNew_C_0] 7 LogBlueprintUserMessages: [SpeechNew_C_0] 14 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 2. not good LogAzSpeech: Display: StartSynthesisWork: AzSpeech Task: SSMLToAudioData (147934); Starting synthesis LogAzSpeech: Display: ProcessSynthesisResult: AzSpeech Task: SSMLToAudioData (147934); Task started. Reason: SynthesizingAudioStarted LogBlueprintUserMessages: [SpeechNew_C_0] 20 LogBlueprintUserMessages: [SpeechNew_C_0] 20 LogBlueprintUserMessages: [SpeechNew_C_0] 20 LogBlueprintUserMessages: [SpeechNew_C_0] 20 LogBlueprintUserMessages: [SpeechNew_C_0] 20 LogBlueprintUserMessages: [SpeechNew_C_0] 20 LogBlueprintUserMessages: [SpeechNew_C_0] 20 LogBlueprintUserMessages: [SpeechNew_C_0] 20 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 3. not good too LogAzSpeech: Display: ProcessSynthesisResult: AzSpeech Task: SSMLToAudioData (147934); Task completed. Reason: SynthesizingAudioCompleted LogAzSpeech: Display: OutputSynthesisResult: AzSpeech Task: SSMLToAudioData (147934); Task completed with result: Success LogAzSpeech: Display: Activate: AzSpeech Task: SSMLToAudioData (149553); Activating LogAzSpeech: Display: StartAzureTaskWork_Internal: AzSpeech Task: SSMLToAudioData (149553); Starting Azure SDK task LogAzSpeech: Display: InitializeSynthesizer: AzSpeech Task: SSMLToAudioData (149553); Initializing synthesizer object LogAzSpeech: Display: CreateSpeechConfig: AzSpeech Task: SSMLToAudioData (149553); Creating Azure SDK speech config LogAzSpeech: Display: ApplySDKSettings: AzSpeech Task: SSMLToAudioData (149553); Applying Azure SDK Settings LogAzSpeech: Display: EnableLogInConfiguration: AzSpeech Task: SSMLToAudioData (149553); Enabling Azure SDK log LogAzSpeech: Display: ApplySDKSettings: AzSpeech Task: SSMLToAudioData (149553); Using language: en-US LogAzSpeech: Display: ApplySDKSettings: AzSpeech Task: SSMLToAudioData (149553); Using voice: JennyNeural LogAzSpeech: Display: ApplyExtraSettings: AzSpeech Task: SSMLToAudioData (149553); Adding extra settings to existing synthesizer object LogAzSpeech: Display: EnableVisemeOutput: AzSpeech Task: SSMLToAudioData (149553); Enabling Viseme LogAzSpeech: Display: StartSynthesisWork: AzSpeech Task: SSMLToAudioData (149553); Starting synthesis LogAzSpeech: Display: ProcessSynthesisResult: AzSpeech Task: SSMLToAudioData (149553); Task started. Reason: SynthesizingAudioStarted LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 13 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0 LogBlueprintUserMessages: [SpeechNew_C_0] 0
I'm not sure it's related this similar issue but for audio, I tried the latest fix , the visemes fixed , but for audio, it's has two aduio played, like half of previous audio and current audio mixed.
with the latest version 1.2.5 , and using ssml to audio data to receive viseme ids and audio:
when first time press B to run it's vismes are ok to received, 2, 3 not looks good