Speech to Text Cutting Off

watson-developer-cloud / python-sdk

:snake: Client library to use the IBM Watson services in Python and available in pip as watson-developer-cloud

https://pypi.org/project/ibm-watson/

Apache License 2.0

1.46k stars 827 forks source link

Speech to Text Cutting Off #170

Closed belgort closed 7 years ago

belgort commented 7 years ago

Hello,

I'm using the Python Speech to Text API which is pretty awesome. I am however finding that it's cutting off after the first few words in this file:

https://www.dropbox.com/s/m3wngsqmidaj249/ibm.wav?dl=0

It cuts out after the first pause. Thank you as always for your awesome support.

Bruce

jsstylos commented 7 years ago

Hi Bruce, that sounds like an issue with the parameter continuous, which is set to False by default, but if you set it to True you'll get the expected behavior. (This is really an API design flaw, as continuos=True is the behavior most users want and expect, but because it's a breaking change, we're waiting until the next major release to fix it.)

belgort commented 7 years ago

Here is the code I use:

data = json.dumps(speech_to_text.recognize( audio_file, content_type='audio/wav', timestamps=False, word_confidence=False,continuous=True), indent=2)

With continuous=True I’m still seeing the same behavior.

jsstylos commented 7 years ago

When I run the code:

from os.path import join, dirname
from watson_developer_cloud import SpeechToTextV1

speech_to_text = SpeechToTextV1(
    username='xxx',
    password='xxx'
)

with open(join(dirname(__file__), 'speech.wav'), 'rb') as audio_file:
    data = json.dumps(speech_to_text.recognize(audio_file, content_type='audio/wav', timestamps=False, word_confidence=False, continuous=True), indent=2)
    print(data)

I get the results:

{
  "results": [
    {
      "alternatives": [
        {
          "confidence": 0.893,
          "transcript": "today I have been playing with IBM Watson "
        }
      ],
      "final": true
    },
    {
      "alternatives": [
        {
          "confidence": 0.894,
          "transcript": "IBM Watson is a supercomputer me by the IBM corporation based in Armonk New York "
        }
      ],
      "final": true
    }
  ],
  "result_index": 0
}

This looks right to me (the service returns an array of results with one per utterance).

kognate commented 7 years ago

I posted the output I get here https://gist.github.com/kognate/d53ff23e1bf8885b536216e998925e02

I see both sentences in the file,

"transcript": "today I have been playing with IBM Watson ", "transcript": "IBM Watson is a supercomputer me by the IBM corporation based in Armonk New York ",

Are you seeing a partial string, or one or more of these transcripts?

belgort commented 7 years ago

Thanks for this. I will move onto the next service and give it a try again later. I do appreciate your assistance. Thank you again.

belgort commented 7 years ago

This issue can be closed.