Open pietrop opened 7 years ago
@pietrop I didn't know that Bing STT supported websockets. Do you have a link to the docs so I can evaluate how hard it would be to add it to this module?
Hey @palmerabollo, here is what I gathered while looking into this:
This is the Bing Speech API overview
- A WebSocket API, useful for apps need an improved user experience by using the power of the full-duplex WebSocket connection. Apps using this API get access to advanced features like speech recognition hypotheses. This API choice is also better for apps that need to transcribe longer audio passages.
Bing Speech API using web socket - javascript SDK
And this is the github repo.
In the README
you see an example and in the sample.html
you see how they'd use it in the browser.
From the way the code it’s written I am not super clear on how the SDK it’s implemented. But it seems like it’s tailored for being used in the browser on the client side. And to work with the microphone as an input. (?)
I am not sure how/if this could be used/re-adapted to work with node, and take an audio file, and/or a live stream as an input. There is an issue being raised discussing this.
This is the Microsoft Bing STT protocol which can be used to implement a socket based solution.
IBM Watson STT has a very good npm module, and API reference. Perhaps this can be used as starting point to make the Bing one(?).
IBM Watson WebSocket interface documentation
Here is the section on how to use the socket service and this is how I use it in my project(autoEdit) for example.
watson-developer-cloud
, speech to text
The code for the module is inside the node-sdk for watson in the speech to text folder
Perhaps if the protocol is similar the IBM one could be re-adapted for the Microsoft one?
Let me know what you think and if there's anything I can help with. (I am learning socket at the moment by going through this online course (spanish) ).
Thanks @pietrop. I think it would be hard to reuse the Browser SDK since contains a lot of code that is tied to the browser. It would be easier to implement the Microsoft Bing STT protocol: open the websocket channel, start sending the audio file in chunks and receive the responses from bing.
The current API could be modified to be event oriented. Something along these lines (pseudocode):
const { BingSpeechClient } = require('bingspeech-api-client');
let options = {
websocket: true, // false would use the REST API alternative
...any other config you need (e.g. format, mode, language)...
};
let client = new BingSpeechClient(subscriptionKey, options);
client.recognizeStream(audioStream);
// Use bing events or expose our own ones.
// It would be great if the events in both the REST and the WS approaches matched
client.on('RecognitionTriggeredEvent', (e) => ...);
client.on('ListeningStartedEvent', (e) => ...);
client.on('RecognitionStartedEvent', (e) => ...);
client.on('error', (e) => ...);
...
It is not a trivial task and I'm not an expert in websockets either, so I don't think I'll have the time to implement it.
ok ok, cool, I'll see how it goes as my knowledge on sockets improves.
Hey @palmerabollo, Have you thought about adding support for socket connection in the underlying implantation?
Not sure where to start but I'd be happy to help with that!
Advantages are that it raises the audio duration limit from 15 seconds for REST API to 10 minutes for socket, for the Bing STT service, according Microsoft documentation.
Let me know if this is on your roadmap.