watson-developer-cloud / speech-to-text-nodejs

:microphone: Sample Node.js Application for the IBM Watson Speech to Text Service
https://speech-to-text-demo.ng.bluemix.net
Apache License 2.0
1.11k stars 706 forks source link

WebSocket connection error #265

Closed leibaogit closed 3 years ago

leibaogit commented 3 years ago

Describe the bug I’m running the speech-to-text-demo in my local, encountering an websocket connection issue, When I click the Record Audio button, it returns WebSocket connection error.

I set the token in the .var file:

% cat .env
# Environment variables
SPEECH_TO_TEXT_IAM_APIKEY=lYzPliclaElz35zV1dHJw9......
SPEECH_TO_TEXT_URL=https://api.us-south.speech-to-text.watson.cloud.ibm.com/instances/ee92ee32-297b-41dd-9c5d-0f8eecd8a120

To Reproduce Steps to reproduce the behavior:

  1. Go to 'http://localhost:3000/'
  2. Click on 'Record Audio'
  3. Open the debug console
  4. See error

Expected behavior The websocket access should success

Screenshots image (34) image (35)

Desktop (please complete the following information):

Smartphone (please complete the following information):

Additional context Add any other context about the problem here.

leibaogit commented 3 years ago

Root cause

Websocket request does not include the access_token parameter:

wss://api.us-south.speech-to-text.watson.cloud.ibm.com/instances/ee92ee32-297b-41dd-9c5d-0f8eecd8a120/v1/recognize?model=en-US_BroadbandModel
leibaogit commented 3 years ago

Analysis

The following code in the node_modules/watson-speech/speech-to-text/recognize-stream.js:161 removes the access_token from the options:

var queryParams = processUserParameters(options, queryParamsAllowed);

This is the processUserParameters code:

node_modules/watson-speech/util/process-user-parameters.js
module.exports = function processUserParameters(options, allowedParams) {
  var processedOptions = {};
  // look for the camelcase version of each parameter - that is what we expose to the user
  allowedParams.forEach(param => {
    var keyName = camelcase(param); <<<<<<< it change the param to camelcase: access_token --> accessToken
    if (options[keyName] !== undefined) {
      processedOptions[param] = options[keyName];
    }
  });
  return processedOptions;
};
leibaogit commented 3 years ago

In the node_modules/watson-speech/speech-to-text/recognize-stream.js: initialize function, I added:

...
console.log("RecognizeStream.option 1: " + JSON.stringify(options));
  // process query params
  var queryParamsAllowed = [
    'access_token',
    'watson-token',
    'model',
    'language_customization_id',
    'acoustic_customization_id',
    'base_model_version',
    'x-watson-learning-opt-out',
    'x-watson-metadata'
  ];
  var queryParams = processUserParameters(options, queryParamsAllowed);
  console.log("RecognizeStream.option 2: " + JSON.stringify(queryParams));
....

The log print the access_token was removed: image

leibaogit commented 3 years ago

So we may need someplace to translate the access_token to accessToken.

Looks like the /Users/ibmuser/bali/IBM/CDLTechBuddy/speech-to-text-nodejs/node_modules/watson-speech/speech-to-text/recognize-stream.js : initialize function is a good place, I added the following code:

  if (options.access_token && !options['accessToken']) {
    options['accessToken'] = options.access_token;
  }

and it resolved the issue, the wss request returned 101 success:

image

leibaogit commented 3 years ago

Seems can not just change the access_token because I found another issue which has the same root cause: iShot2021-07-27 15 51 22

The keywords_threshold was also skipped.

So the only workaround way is remove the camelcase() in the:

node_modules/watson-speech/util/process-user-parameters.js

module.exports = function processUserParameters(options, allowedParams) {
  var processedOptions = {};

  // look for the camelcase version of each parameter - that is what we expose to the user
  allowedParams.forEach(param => {
    // var keyName = camelcase(param);   -----
    keyName = param;   // ++++++
    if (options[keyName] !== undefined) {
      processedOptions[param] = options[keyName];
    }
  });
leibaogit commented 3 years ago

Well, even with the above change, I still encounter other issue, which was caused by the above change: Error: unable to transcode data stream application/octet-stream -> audio/l16 image (36)

The root cause was that the content type header was skipped by the above code change because the option uses contentType, but in the openingMessageParamsAllowed, it uses the keywords_threshold.

leibaogit commented 3 years ago

So now I made this change to change all the keys in the options to the camelcase at the very first of the initialize function:

RecognizeStream.prototype.initialize = function() {
  var options = {};
  var camelcase = require('camelcase');

  for (const [key, value] of Object.entries(this.options)) {
    var newKey = camelcase(key);
      options[newKey] = value;
  }
....

Now everything works great !!!!

leibaogit commented 3 years ago

Seems don't need to change the sdk code, just need to update the demo’s code:

% git diff views/demo.jsx
diff --git a/views/demo.jsx b/views/demo.jsx
index 05569e8..a7bbe50 100644
--- a/views/demo.jsx
+++ b/views/demo.jsx
@@ -105,22 +105,22 @@ export class Demo extends Component {
     const keywords = this.getKeywordsArrUnique();
     return Object.assign({
       // formats phone numbers, currency, etc. (server-side)
-      access_token: this.state.accessToken,
+      accessToken: this.state.accessToken,
       token: this.state.token,
-      smart_formatting: true,
+      smartFormatting: true,
       format: true, // adds capitals, periods, and a few other things (client-side)
       model: this.state.model,
       objectMode: true,
-      interim_results: true,
+      interimResults: true,
       // note: in normal usage, you'd probably set this a bit higher
-      word_alternatives_threshold: 0.01,
+      wordAlternativesThreshold: 0.01,
       keywords,
-      keywords_threshold: keywords.length
+      keywordsThreshold: keywords.length
         ? 0.01
         : undefined, // note: in normal usage, you'd probably set this a bit higher
       timestamps: true, // set timestamps for each word - automatically turned on by speaker_labels
       // includes the speaker_labels in separate objects unless resultsBySpeaker is enabled
-      speaker_labels: this.state.speakerLabels,
+      speakerLabels: this.state.speakerLabels,
       // combines speaker_labels and results together into single objects,
       // making for easier transcript outputting
       resultsBySpeaker: this.state.speakerLabels,

Created a PR with the change: https://github.com/watson-developer-cloud/speech-to-text-nodejs/pull/266

apaparazzi0329 commented 3 years ago

Closing as resolved from PR: https://github.com/watson-developer-cloud/speech-to-text-nodejs/pull/266