watson-developer-cloud / node-sdk

:comet: Node.js library to access IBM Watson services.
https://www.npmjs.com/package/ibm-watson
Apache License 2.0
1.48k stars 670 forks source link

STT: Websocket Recognize stream - API Key times out after 1 hour & causes Connection Error #849

Closed RMichaelPickering closed 5 years ago

RMichaelPickering commented 5 years ago

Our application uses node.js on backend, with audio streamed from Unity3D. We're able to get the recognize method working by passing on the audio, and we're opening a session as we detect each user, then closing it when they're inactive.

Everything works well until our app has been running for about an hour, at which point it seems like the API key times out, and all our connections get an error. All we can do to work around this is restart the app.

Is there no way to manage the lifecycle of the API key? Or refresh it after a period of time?

emre93 commented 5 years ago

After one hour STT gives me this error;


{ WebSocket connection error: WebSocket connection error
    at W3CWebSocket.socket.onerror (C:\Users\emre_\Desktop\CC-Conference-master\node_modules\watson-developer-cloud\lib\recognize-stream.js:212:23)
    at W3CWebSocket._dispatchEvent [as dispatchEvent] (C:\Users\emre_\Desktop\CC-Conference-master\node_modules\yaeti\lib\EventTarget.js:107:17)
    at W3CWebSocket.onConnectFailed (C:\Users\emre_\Desktop\CC-Conference-master\node_modules\websocket\lib\W3CWebSocket.js:217:14)
    at WebSocketClient.<anonymous> (C:\Users\emre_\Desktop\CC-Conference-master\node_modules\websocket\lib\W3CWebSocket.js:59:25)
    at emitOne (events.js:116:13)
    at WebSocketClient.emit (events.js:211:7)
    at WebSocketClient.failHandshake (C:\Users\emre_\Desktop\CC-Conference-master\node_modules\websocket\lib\WebSocketClient.js:339:10)
    at ClientRequest.<anonymous> (C:\Users\emre_\Desktop\CC-Conference-master\node_modules\websocket\lib\WebSocketClient.js:278:18)
    at emitOne (events.js:116:13)
    at ClientRequest.emit (events.js:211:7)
  name: 'WebSocket connection error',
  event:
   _Event {
     type: 'error',
     isTrusted: false,
     _yaeti: true,
     target:
      W3CWebSocket {
        _listeners: {},
        addEventListener: [Function: _addEventListener],
        removeEventListener: [Function: _removeEventListener],
        dispatchEvent: [Function: _dispatchEvent],
        _url: 'wss://gateway-wdc.watsonplatform.net/speech-to-text/api/v1/recognize?model=en-US_BroadbandModel',
        _readyState: 3,
        _protocol: undefined,
        _extensions: '',
        _bufferedAmount: 0,
        _binaryType: 'arraybuffer',
        _connection: undefined,
        _client: [Object],
        onerror: [Function],
        onopen: [Function],
        onclose: [Function],
        onmessage: [Function] },
     cancelable: true,
     stopImmediatePropagation: [Function] } }
TypeError: Cannot read property '0' of undefined
    at onEvent (C:\Users\emre_\Desktop\CC-Conference-master\server\server.js:373:65)
    at RecognizeStream.<anonymous> (C:\Users\emre_\Desktop\CC-Conference-master\server\server.js:191:48)
    at emitOne (events.js:121:20)
    at RecognizeStream.emit (events.js:211:7)
    at W3CWebSocket.socket.onerror (C:\Users\emre_\Desktop\CC-Conference-master\node_modules\watson-developer-cloud\lib\recognize-stream.js:215:18)
    at W3CWebSocket._dispatchEvent [as dispatchEvent] (C:\Users\emre_\Desktop\CC-Conference-master\node_modules\yaeti\lib\EventTarget.js:107:17)
    at W3CWebSocket.onConnectFailed (C:\Users\emre_\Desktop\CC-Conference-master\node_modules\websocket\lib\W3CWebSocket.js:217:14)
    at WebSocketClient.<anonymous> (C:\Users\emre_\Desktop\CC-Conference-master\node_modules\websocket\lib\W3CWebSocket.js:59:25)
    at emitOne (events.js:116:13)
    at WebSocketClient.emit (events.js:211:7)
Erroorrr!!
1006

AND I am calling the STT with this method; This is just a part of the code to show how I am calling the STT. We have a main server in the cloud, from nodejs server we call stt, our server receives the audio from Unity app using websockets and pass the raw data to STT using stream method of nodejs.

var speechService = new SpeechToTextV1({
   iam_apikey: 'THE ACTUAL API KEY ',
    url: serviceUrl,
  headers: {
    'X-Watson-Learning-Opt-Out': 'true'
  }
});

   var params = {
        objectMode: true,
        'content_type': 'audio/l16;rate=16000',
        max_alternatives: 3,
        'model': 'en-US_BroadbandModel',
        inactivity_timeout : -1,   /// in here even though i set -1, the stt gives session timed out error after 
                                                    30 sec.
        'keywords_threshold': 0.85,  
        'interim_results' : true,
        'word_confidence' : true    
   };

wss.on('connection', function (client) { 

  console.log("New Client connected");

  if (client.readyState === 1)
    client.send("US-EAST Solution");

var recognizeStream = speechService.recognizeUsingWebSocket(params);

stream.pipe(recognizeStream);

// Listen for events.
 recognizeStream.on('data', function(event) { onEvent('Data:', event); });
 recognizeStream.on('error', function(event) { onEvent('Error:', event); });
 recognizeStream.on('close', function(event) { onEvent('Close:', event); });

function onEvent(name, event) { console.log(name, JSON.stringify(event, null, 2)); } }

dpopp07 commented 5 years ago

@RMichaelPickering Hmm that's an interesting problem. There is a way to manage the token and in fact, the SDK can refresh tokens once they expire. That said, I believe there is only one authentication step here (when we first open the connection) so I'm not sure how changing the token would make a difference without re-establishing a connection. Perhaps there is a way to update the token in the state of the connection but I do not know it - we send the token once as a header.

@emre93 It's interesting that the error message is Cannot read property '0' of undefined. That's not in your code it doesn't look like, right?

emre93 commented 5 years ago

Thank you for the answer @dpopp07 . You are right about this error "Cannot read property '0' of undefined". There is a variable in my code which assigns a value to variable's first array element(the array is created using split function). in this case is [0]. Because it tries to split a json object, it is not able to create an array of the variable so it gives that error. I am not worry about that error but I do worry about the 1 hour cut off problem.

I am trying to figure out the token expiring problem and also session timed out error when there is no audio data is sent in 30 sec or silent is detected in 30 sec.

Is there any idea how to handle that problem ?

RMichaelPickering commented 5 years ago

@dpopp07 Thanks Dustin! Perhaps you can explain how this is meant to work, in case we're missing something. From our perspective, we simply want to establish a Websocket connection from our node.js service to Watson STT and hold it open so that as new voice activity from a user is detected, we can easily and quickly start a Recognize session. In concept, we should be able to do this for each client that connects to our node.js server. I have to admit that I'm not entirely sure if we're currently creating a new token per each client connection, or just using the same token for all clients. Either way, at some point, a token is going to expire, and this causes the corresponding Websocket connection to be terminated from the Watson STT side, which leads to a messy situation on the node.js side. We discussed setting a timer to trigger refeshing the token and then re-open the Websocket, but this could be problematic given that there could already be user audio streaming through the connection at that time.

germanattanasio commented 5 years ago

We are investigating an issue where the service drops the connection after 1 hour. The token is only used to authenticate the request that opens the connection so once the connection is open the token expiration date doesn't matter.

@RMichaelPickering there is a limit in the number of connections you can have open at the same time. I think it's 20, but I would have to check in the documentation.

RMichaelPickering commented 5 years ago

@germanattanasio Thanks! That's an interesting answer that suggests our node.js app either wouldn't be able to support nearly as many clients as would otherwise be possible due to a limitation of the Watson STT API, or that we'd need to somehow pool Websocket connections to STT and dispatch multiple Recognize requests to STT in parallel. What is the IBM-recommended Watson STT connection architecture for node.js apps please?

germanattanasio commented 5 years ago

It depends on what plan you have. I think free and standard are ~20 to ~30. Premium can handle more connections. You can also try using different service instances since each will give you ~20 connections.

AFAIK, there isn't a recommended connection architecture. If you are planning on having hundreds of clients, I would contact sales https://ibm.biz/contact-wdc-premium

RMichaelPickering commented 5 years ago

We are planning on having hundreds of clients, and that's one of the reasons that we're using node.js and the Watson SDK for node.js! For now, I guess it would be helpful to know if we're on very much the wrong track. In our case, we weren't able to use the STT example from the SDK for node.js very much because our client isn't actually a browser. However, does that example use a Websocket and audio streaming? Does it stream audio up to Watson from browser directly or via node.js? If the latter, how are the connections managed for multiple simultaneous users? Finally, are there any significant updates related to this that will be coming in the new release of the node.js SDK next week?

germanattanasio commented 5 years ago

However, does that example use a WebSocket and audio streaming?

Yes

Does it stream audio up to Watson from the browser directly or via node.js?

Yes

How are the connections managed for multiple simultaneous users?

I assume you are asking for the example in which case we don't have multiple users. If your question is regarding the service, then I don't know that answer.

@daniel-bolanos what's the limit for concurrent WebSocket connections in lite or standard? How can someone increase that limit and finally, are there recommendations on how to handle multiple connections at the same time?

RMichaelPickering commented 5 years ago

Thanks this is very helpful! Just to clarify, your second answer is meant as, yes, it streams audio up to Watson STT directly from the browser, correct? So in this case, there is no connection for any users through node.js to Watson STT? To be clear, this is more of a browser and JavaScript example of connecting to Watson STT, in that all the node.js app does is push all the code to create the Websocket and stream audio into Watson STT down to the browser, right?

Are you aware of any other users who stream audio through node.js as we're doing? We think this makes sense as it sets us up to run additional logic and use other Watson services such as Assistant from our node.js app running in IBM cloud without having to make additional hops back to client/browser.

dpopp07 commented 5 years ago

@RMichaelPickering I believe by "yes", he meant that you can stream audio up to Watson from the browser or from NodeJS. There are plenty of examples of users streaming audio from NodeJS and in fact, that is what this library does. You are able to interact with Watson without ever touching the browser. Does that answer your question?

RMichaelPickering commented 5 years ago

@dpopp07 Thanks Dustin! Could you please point us to one of the Watson STT audio streaming node.js examples that doesn't touch the browser?

dpopp07 commented 5 years ago

No problem! There are a couple examples in this repo. They are not super in-depth but they should give you an idea of what the basic usage is.

There's another example in the API docs that uses object mode and event listeners, which shows some other possibilities.

I hope that helps!

RMichaelPickering commented 5 years ago

@dpopp07 Thanks again Dustin! I see what you mean now, and we looked at these examples before. I'm not trying to be difficult, but these examples appear to be more like unusual edge conditions rather than relevant real world examples. This is because I assume that, like us, most real world use cases for node.js would involve the app running on a large server (possibly in IBM Cloud or some other data center), and the idea would be that the node.js app is, in turn, supporting multiple connected users. In these cases, it is simply not possible to have a bunch of user microphones (or even one) plugged into the node.js server! I can certainly see how a valid architecture pattern is that the node.js app is actually supporting users connecting via web browsers, and in this case, one valid pattern for audio capture may be to use a microphone available to the user's browser, and stream it directly to Watson STT without passing through node.js. However, in this scenario, all the application logic, including interpreting the results of transcription coming back from Watson STT, would need to also be implemented in the app running in the browser, as that's where the STT transcription results are returned. My point is that this isn't necessarily optimal, as in most cases one would want the app logic to run in node.js itself! Indeed, this is what we're attempting, and in our app we pull in other services such as Watson Assistant to accomplish this from node.js directly. To me, this is the better and much more interesting architectural pattern! Of course, this is because we want to support many users streaming audio to our node.js app in IBM Cloud, then in pass this over to Watson STT for transcription, receive back the results and continue on with more logic running in IBM Cloud to interpret the results (such as running the transcription through Watson Assistant, etc.). But of course this requires this approach to support many clients all streaming audio to node.js, then on to Watson STT from there, which I'm now concerned might not even work beyond a dozen or so simultaneous users. Is there something I'm missing?

dpopp07 commented 5 years ago

I see what you're saying. Yeah, the examples are not meant to be edge cases but just to show the fundamentals of interacting with the service. Then it's up to developers to extrapolate to their own applications. We would probably benefit from having examples demonstrating more complicated use cases, so that's something we could look into in the future.

I don't think you're missing anything - what you describe should be possible (although there may be a limit to the amount of connections based on the service, I don't actually know).

I'm not sure why you would need any code on the browser. Where are you getting your audio from for your application?

RMichaelPickering commented 5 years ago

@dpopp07 Thanks Dustin, this REALLY is helpful discussion!! Glad to hear that, other than the potential limitation on the number of connections, we really are probably on the right track! How can we get that question answered?

For our application, audio is being streamed from a Unity app running on a local (user) machine. The code for this was based on code in the Watson SDK for Unity, but it's been improved to reduce latency and modified to stream through our node.js middle tier via Websocket, rather than direct from Unity to the Watson STT service.

dpopp07 commented 5 years ago

Someone from the service would have to answer about the number of concurrent connections. @daniel-bolanos do you know? Or, maybe that info is in the documentation somewhere.

Other than that, you should definitely be on the right track. If you are receiving a NodeJS Stream type, you can "pipe" that right to the RecognizeStream. If you are receiving chunks of binary data (NodeJS type Buffer) you can send them to the service one at a time using the "write" function - recognizeStream.write(chunk).

emre93 commented 5 years ago

Hi @dpopp07, Thanks for the answers! I was wondering is there any update about the 1 hour cut off problem ? And also do you know this problem is fixed in new version of Watson Api for Nodejs ?

daniel-bolanos commented 5 years ago
emre93 commented 5 years ago

Hi @daniel-bolanos, Thank you for the answer! Can I ask you to show me an example of how to send ping-pong frames. I believe that may cause the problem.

germanattanasio commented 5 years ago

@emre93 take a look at https://stackoverflow.com/questions/10585355/sending-websocket-ping-pong-frame-from-browser

We don't have an implementation or documentation on how to do that

daniel-bolanos commented 5 years ago

https://tools.ietf.org/html/rfc6455#section-5.5.2

emre93 commented 5 years ago

Thank you for the answers! I am no professional on this. Thats why I need a bit more help. So According to my understanding from the links. The STT server suppose to send Ping to me and I need to send Pong in order to tell the server that I am alive right ? So I wrote a code line like this;

recognizeStream.on('ping',function(event) { onPing('Ping:', event); });

function onPing(name, event) { console.log(name + " " + event); }

to see the server sends any ping to me. However, It did not get triggered. Is it wrong way to listen the server? Or Do you know how should I listen the server in order to see the ping is coming to the client ?

dpopp07 commented 5 years ago

You are actually supposed to ping the server and it will send a pong back. You do this to let the server know you are still listening and to keep the connection alive - it is proactive on the client's part. Some WebSocket libraries have functionality to do this built in, but we are using the W3CWebSocket protocol and it is not built in to the library we are using. I will look into adding something to the SDK that would allow you to do this but I am not sure how fast it will be so you may want to keep trying yourself. If you find a good way, feel free to open a PR!

dpopp07 commented 5 years ago

@emre93 Turns out, ping frames are not a part of the W3C WebSocket protocol and therefore a limitation of the SDK. For now, I don't think we'll be able to add functionality to do that. An idea for something to try next is to just send audio data over the connection like you already are. Something like a wav file containing a drum hit or something like that may work, or a placeholder word/phrase.

emre93 commented 5 years ago

@dpopp07 Thank you for the detailed look for the solution. And I will do that until another solution comes!

RMichaelPickering commented 5 years ago

Can we get an update on this issue please? We're just now moving to the new SDK but so far don't see any evidence to suggest this will fix this issue. We are going to try to switch from using API key to access token at the same time, but again, it's not clear that this will fix the issue. This is becoming a potential production issue, and even for demos we're having to restart our node.js service before each demo to ensure we don't run into an issue during a demo. If this can't be made to work reliably -- and soon -- we'll need to look at other Cloud vendor solutions!

dpopp07 commented 5 years ago

@RMichaelPickering I have not been able to look at this issue in a bit. Since we currently have a limitation preventing us from using ping-pongs, the solution is probably to send interim data when there is no activity. See this documentation. It states that even silence will work to keep the connection alive. We can look into adding functionality in the SDK to send silence but it will be much faster if you can add this to your app in the meantime. Let me know if you can try that and what your results are if you do

RMichaelPickering commented 5 years ago

@dpopp07 Thanks for your response. As I mentioned before, we're switching from using an API key to an access token when we authenticate to the STT service and open the Websocket connection. We also plan to close it whenever the audio from the client has been suspended, then re-open the connection whenever new audio data is available. If this combination of changes doesn't resolve the issue then we'll look at other options. If it does, then we can close the issue with the resolution that using an access token instead of an API key is an acceptable workaround.

dpopp07 commented 5 years ago

@RMichaelPickering Sounds good. Keep me posted. I will warn you that the access tokens do expire after an hour. Once the connection opens, the expiration time no longer matters. The connection should remain open indefinitely until there is a timeout as stated in the documentation I posted above. However - if you open a connection, then the token expires, then you close the connection and try to open a new one, that connection will fail due to the expired token. The benefit of using an API key is that the SDK will fetch a new token for you if the old one expired, before trying to establish the new connection. If using the token, you'll need to handle fetching new ones when they expire.

RMichaelPickering commented 5 years ago

@dpopp07 In the documentation for the Watson STT node.js SDK documentation, there is no method listed for closing a Websocket recognition session. How do we do this please?

RMichaelPickering commented 5 years ago

@dpopp07 @germanattanasio @daniel-bolanos We've now moved to the latest Watson SDK for node.js and double checked our Websocket connection process. Unfortunately there's still an issue. I'm going to open a new Issue to make the details clear.

dpopp07 commented 5 years ago

@RMichaelPickering To answer your question from a couple of days ago, you can use the .stop() method to close the session. I'll look our for that other issue to hear more details

RMichaelPickering commented 5 years ago

@dpopp07 Opened a new issue, please see #902

dpopp07 commented 5 years ago

Closing this issue in favor of #902

@RMichaelPickering Please re-open if you feel this issue is worth tracking as well

RMichaelPickering commented 5 years ago

@dpopp07 It is indeed the same issue. #902 hopefully provides more detailed information to aid with debugging, but this one has not yet been fixed and has now been open for 4 months! I don't really think it should be closed until the underlying issue is properly fixed!