[text-to-speech] Get length of the audio when connecting via Websocket (WS)

watson-developer-cloud / unity-sdk

:video_game: Unity SDK to use the IBM Watson services.

Apache License 2.0

569 stars 206 forks source link

[text-to-speech] Get length of the audio when connecting via Websocket (WS) #652

Closed kpprt closed 3 years ago

kpprt commented 4 years ago

When using a Websocket connection to stream the audio we currently don't know when we have reached the end. Is it possible to either

get the full length of the audio at the start of the stream (if possible) or
get a signal from the server - respectively an event from the SDK - when the stream has finished?

Thanks for the support! Chris

mamoonraja commented 4 years ago

@kpprt Thanks for opening the issue, which version of SDK are you using, and can you provide me a snippet of how you are implementing it, or are you using one of our example snippets?

kpprt commented 4 years ago

@mamoonraja Thanks for the reply! We are using the currently latest version: 4.8.0. I can provide a snippet, but it will take some time to throw out unrelated stuff.

I couldn't find an example for TTS with Websockets, but if there is one I'd be happy to have a look at it.

kpprt commented 4 years ago

Hi @mamoonraja You can find a simplified example here: https://gist.github.com/kpprt/bba2b187e2cc3ac70014a61c72f3c83a

Simply drop the MonoBehaviour into a new Scene with a default Camera and an AudioListener. Also make sure to enter valid credentials in the Serialized Fields.

We normally use async await functions instead of Coroutines, but I rewrote it so it works without any additional tools.

As a first hint regarding the end being cut off (#635): The issue only seems to arise when the output text is long enough, e.g.: "Hello Mamoon, thanks for helping out and giving support!" is being cut off occasionally whereas "Hello Mamoon, thanks for helping out!" is not.

kpprt commented 4 years ago

@mamoonraja Hi Mamoon, is there any update on this matter? I saw that there is an OnListenClosed method in the TextToSpeechService.cs, but unfortunately that event is not available from outside the SDK.

mamoonraja commented 4 years ago

Hi, we still don't have any new updates on #635. But you can use onClose method to look for when the connection is closed. Seehttps://github.com/watson-developer-cloud/unity-sdk/blob/master/Scripts/Services/TextToSpeech/V1/TextToSpeechServiceExtension.cs#L218

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has had no recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

JPM87 commented 3 years ago

@mamoonraja Can you explain why the header does not contain the correct chunk length? Is it an issue in the unity-sdk? Instant audio streaming is the reason why I would prefer websockets to the rest service... The hack only applies if we have the full audio clip available....

The second issue (alreday reported) is that the audio is cut off because the socket closes before transmitting all data. I am trying to implement the socket in a business application and we really prefer watson. But these issues are really disappointing and the fixes do take way to long :(

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has had no recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

RadekKazbundaAtMama commented 3 years ago

Can you explain why the header does not contain the correct chunk length?

Streaming is starting before the input text is fully processed, so the actual output length is not yet know. TTS service is designed to process very long inputs in a streaming way, so: TTS service will start streaming audio when it has first few 100ms of audio available, at that moment it does not know how much audio will have to produce or how to process the rest of input text.

The missing audio length in header is not considered a "bug", but a part of streaming feature.