Closed colinskow closed 7 years ago
Hi @colinskow
Unfortunately, that's not the way the API works. You have to use word_timings to correlate the speaker_labels to the text.
The Browser JS SDK has some helper code to correlate things for you, and while it won't currently work with the RecognizeStream here in the Node.js SDK, I think you can use the RecognizeStream from the browser SDK in node.js, you'll just need to encode the credentials header yourself:
const fs = require('fs');
const { RecognizeStream, SpeakerStream } = require('watson-speech/speech-to-text');
const STT_USERNAME = '...';
const STT_PASSWORD = '...';
fs.createReadStream('some/audio/file.wav')
.pipe(new RecognizeStream({
headers: {
authorization: 'Basic ' + Buffer.from(STT_USERNAME + ':' + STT_PASSWORD).toString('base64')
},
objectMode: true
})
.pipe(new SpeakerStream())
.on('data', data => {
console.log('data', data);
});
Then it should break down the results by speaker and put a speaker field on each result. I haven't tried it, but I think it will work in Node.js. You can see how the browser SDK uses those two together here.
Note that each data
object will include multiple results instead of the typical single result. If you don't enable interim_results
there will be only a single data
event; if you do, then only the last data
object will have final results, because the speaker labels can change at any time before the final labels are emitted. (Text may jump from one result to another until then.)
(I have plans to share code between the two libs, but not sure when I'll be able to get to it...)
Using continuous recognition via
speech_to_text.createRecognizeStream
speaker labels are showing up under the incorrectresult_index
.Expected behavior: speaker labels should show up in the same object where
results[0].final = true
, and at least have the sameresult_index
so they can be correlated to the correct word alternatives.Actual behavior: speaker labels show up mostly where
results[0].final = false
and have aresult_index
1 greater than the correct number. This makes it difficult to correlate the speaker labels with other properties.These are the settings used: