[speech-to-text] Watson Ambient Noise & Children's Voice's

watson-developer-cloud / unity-sdk

:video_game: Unity SDK to use the IBM Watson services.

Apache License 2.0

571 stars 208 forks source link

[speech-to-text] Watson Ambient Noise & Children's Voice's #218

Closed subvertio closed 7 years ago

subvertio commented 7 years ago

I'm not sure if this is a 'feature request' or bug, but Speech to Text has a very hard time with ambient noise and Children's voices.

Any ambient noise gives erratic results, and voices of young people results are very unreliable.

mediumTaj commented 7 years ago

There are different ways to handle ambient noise, but all of these should be implemented on the application level, not on the SDK level.

You can add audio filters to the audio or use a microphone where you can control the gain.

However, if you come up with a cross-platform Unity-Based solution, we do accept pull requests!

mediumTaj commented 7 years ago

As far as children's voices are concerned, Speech to Text can be customized to specific language domains (healthcare, law, etc). Words and their pronunciations can also be added to this customization. Look here and search "Using the sounds_like field".

In the Unity SDK you can add customizations by writing C# script or you can use another SDK such as the Python SDK or a REST client such as PostMan to train.

mediumTaj commented 7 years ago

Trying to strikethrough the above comment about the "sounds_like" field. This note came from the Speech to Text division:

Current STT customization does language model adaptation. For children voices, the more critical adaptation is on the acoustic space, as opposed to the language domain. So it might help a bit (if what the child says is out of domain, e.g. they speak in less grammatical fashion). We are planning to have acoustic model customization later this year which will be more helpful for children voices.