watson-developer-cloud / node-sdk

:comet: Node.js library to access IBM Watson services.
https://www.npmjs.com/package/ibm-watson
Apache License 2.0
1.48k stars 669 forks source link

Large audio files in speech-to-text result in the end event never being hit #68

Closed radleymith closed 9 years ago

radleymith commented 9 years ago

If someone using the speech-to-text recognizeLive function has a file that is nearing the 100 MB limit the service never actually fires the end event so the callback function is never called. It works fine for smaller files.

To be fair this functionality isn't 100% necessary since once can use observeResult function, but it's part of the api.

germanattanasio commented 9 years ago

I need to investigate this.

radleymith commented 9 years ago

I think that its an issue of the server timing out on the request... The result.on('data, fn) and result.('end', fn) event handlers, inside of the resultLive function, are not called until after all of the audio is processed. It's my bet that your server is timing out after 2 or 3 minutes and this takes in the neighborhood of 25 minutes to process, but by then the request is timed out and can't emit any events.

germanattanasio commented 9 years ago

@watson-developer-cloud/devs anyone from the speech team that can help with this?

germanattanasio commented 9 years ago

@baniel-bolanios ??

daniel-bolanos commented 9 years ago

@radleymith, when you say "it works fine for smaller files", do you mean smaller than 100MB? Because that is the maximum file limit at the moment. Can you please elaborate?

radleymith commented 9 years ago

When I say smaller files I mean a file in the 250kb range, I use it as a tester file to test functionality. The large file I'm talking about is 80.1mb. I thought about testing the 4mb threshold because I know that is the one shot limit, but I don't have time and found a workaround using the observeResult function where I save the data in that functions callback.

I thought that the recognizeLive function would not act as a one shot, but if in fact it does act like a one shot similar to the regular recognize function then the 4 mb threshold would make sense. Otherwise I would think that it is a timeout error because none of the event handlers in the recognizeLive callback are hit until after the service has completely decoded the audio file from speech to text, which takes about 25 minutes.

result.on('data', function(chunk) {
      transcript += chunk;
    });

    result.on('end', function() {
      try {
        transcript = formatChunk(transcript);
      } catch (e) {
        callback(transcript);
        return;
      }
      callback(null, transcript);
    });

^^neither of those handlers are hit until after the audio is fully decoded.

If recognizeLive uses recognize as it's endpoint and has the same 4mb limit for its callback function, then is the only difference the ability to attach observeResult to the session??

germanattanasio commented 9 years ago

@radleymith you can also try with the websocket url. The wrapper is using a full duplex http request.

when you call recognizeLive are you sending a session_id?

radleymith commented 9 years ago

@germanattanasio I am using a session id here is my code... note this does work through the use of observe result

speechToText.createSession({}, function(err, session){
  if (err){
    console.log('error:', err);
    return;
  }

  var request = speechToText.recognizeLive({
    content_type: 'audio/wav;rate=16000',
    continuous: true,
    word_confidence: true,
    timestamps: true,
    session_id: session.session_id,
    cookie_session: session.cookie_session }, noop);

  // call observe result to get intermin results
  speechToText.observeResult({
    interim_results: true,
    session_id: session.session_id,
    cookie_session: session.cookie_session }, observeResult);

  // pipe the audio to the request
  // once the stream is consumed it will call request.end()
  fs.createReadStream(audioDir + fileName + '.wav').pipe(request);
});

NOTE: noop is not a 100% noop right now, it would log whatever is passed.

Also, I would really like to use the websocket implementation and have gotten to the point where i receive a successful response from the start message. I am however, unable to figure out how to send the audio file over the connection. I guess basically I am unsure of what I need to do in order to send it over the connection, it is in a .wav file format. Would I need to turn it into a buffer? how? etc

germanattanasio commented 9 years ago

Since the problem is that the server timeouts before getting the response we will fix this by fixing #110. I think switching to WebSockets will fix the timeout issue which in fact is not an issue in this library

daniel-bolanos commented 9 years ago

German, please lets make the switch to WebSockets, that will promote their utilization. These timeouts are very annoying and many users are experiencing them.

thank you