watson-developer-cloud / node-sdk

:comet: Node.js library to access IBM Watson services.
https://www.npmjs.com/package/ibm-watson
Apache License 2.0
1.48k stars 668 forks source link

[speech-to-text] Keywords array not working correctly in recognize() with sessions #261

Closed jeffpk62 closed 7 years ago

jeffpk62 commented 8 years ago

The following call to the recognize() method when using sessions is using only the first keyword of the array (I extracted this from a larger program because it uses sessions; I can pass that along if needed):

    var watson = require('watson-developer-cloud');
    var fs = require('fs');

    var speech_to_text = watson.speech_to_text({
        username: 'username',
        password: 'password',
        version: 'v1'
    });

    var params = {
        audio: fs.createReadStream('../resources/audio-file.flac'),
        content_type: 'audio/flac',
        max_alternatives: 3,
        word_confidence: true,
        keywords: ['colorado', 'tornado', 'tornadoes'],
        keywords_threshold: 0.5
    };

    speech_to_text.recognize(params, function(error, transcript) {
        if (error)
            console.log(error);
        else {
            console.log('Recognize result:', JSON.stringify(transcript, null, 2));
        }
    });

Either the recognize() method is not handling the keywords parameter correctly, or the keywords need to be specified in some different fashion. The function returns results for only the first keyword from the list. In the test file, both colorado and tornadoes are present, but tornado is not. When I run the function with the previous values, I get results for colorado but not for tornadoes. Different permutations of the list of keywords indicate that only the first element of the array is being considered.

jeffpk62 commented 7 years ago

I just tested this again both with and without sessions, and I still get only the first keyword from the array of three keywords that I specify. And again, swapping the order of the keywords confirms that only the first is being used. This is with the latest Node SDK (v2.8.1).

jeffpk62 commented 7 years ago

Note that keyword matching does work correctly with the WebSocket implementation, createRecognizeStream(), given identical input parameters.

germanattanasio commented 7 years ago

@jeffpk62 how do you specify the keyword in a curl command?
I assume is by doing something like keyword=k1,k2,k3

jeffpk62 commented 7 years ago

I specify them as shown above, @germanattanasio :

   var params = {
        audio: fs.createReadStream('../resources/audio-file.flac'),
        content_type: 'audio/flac',
        max_alternatives: 3,
        word_confidence: true,
        keywords: ['colorado', 'tornado', 'tornadoes'],
        keywords_threshold: 0.5
    };

When calling the WebSocket-based createRecognizeStream(), I specify them the same way, and that method returns the expected results:

var params = {
  content_type: 'audio/flac',
  continuous: true,
  interim_results: true,
  keywords: ['colorado', 'tornado', 'tornadoes'],
  keywords_threshold: 0.5
};

For a cURL command (from a bash test script), they are specified as a query parameter of the following form:

`"$URL/sessions/$SESSION_ID/recognize?max_alternatives=3&word_confidence=true&keywords=%22colorado%22%2C%22tornado%22%2C%22tornadoes%22&keywords_threshold=0.5"``

nfriedly commented 7 years ago

the request library turns arrays into multiple querystring params, so keywords: ['a','b','c'] becomes ?keywords=a&keywords=b&keywords=c - but I think the service wants to see ?keywords=a,b,c. I'll get a fix and a test in place shortly.

jeffpk62 commented 7 years ago

Thanks, Nathan!

nfriedly commented 7 years ago

BTW @jeffpk62 , we may want to tweak the wording for the curl docs at http://www.ibm.com/watson/developercloud/speech-to-text/api/v1/#recognize_sessionless_nonmp12

Parameter Type Description
keywords optional query string[ ] A list of keywords to spot in the audio. Each keyword string can include one or more tokens. Keywords are spotted only in the final hypothesis, not in interim results. Omit the parameter or specify an empty array if you do not need to spot keywords.

It should probably mention that multiple keywords need to be separated by commas.

jeffpk62 commented 7 years ago

@nfriedly given a type of string[], isn't a comma-separated list of elements kind of expected? We also have examples that show actual keyword input in a few places. So I'm not sure we need to say this, especially given how often I'm accused of being too verbose. :-)

nfriedly commented 7 years ago

Well, there's obviously at least one other interpretation - the default behavior for the Node.js request library that we use is to create multiple querystring params when serializing an array, not a comma-separated list. (jQuery has this behavior this also).

Here's my proposal for an improved (and actually shorter!) description:

Parameter Type Description
keywords optional query string[ ] A comma-separated list of keywords strings to spot in the audio. Each keyword string can include one or more words. Keywords are spotted only in the final hypothesis, not in interim results.
jeffpk62 commented 7 years ago

Hi, Nathan. Your suggested text is shorter because it omits the last sentence of the existing description; otherwise, it's longer. And if I add this level of detail here ("comma-separated"), then I should reasonably add the same level of detail for all arrays.

I'm not sure it's necessary to state that array elements are comma-separated, since that's pretty much how they're specified. I don't think the interpretation that specifies multiple instances of the query parameter for each keyword is one that's going to be a common problem, since the description is for a single query parameter, keywords, with an array of strings, not for a series of query parameters, one per keyword. We also have examples that show the correct usage for all languages, which should address the issue for users who find themselves confused.

I've honestly never seen this issue raised before in any context, so I'm not sure it's one that needs to be addressed in this case. Copying @bvonhag in case he has any comments to add.

bvonhag commented 7 years ago

Given that we have examples that show how to use this, and everyone knows how a list is represented, I think that adding "comma-separated" everywhere is probably overkill.