palmerabollo / bingspeech-api-client

Microsoft Bing Speech API client in node.js
Other
32 stars 17 forks source link

Add sockets support? #14

Open pietrop opened 7 years ago

pietrop commented 7 years ago

Hey @palmerabollo, Have you thought about adding support for socket connection in the underlying implantation?

Not sure where to start but I'd be happy to help with that!

Advantages are that it raises the audio duration limit from 15 seconds for REST API to 10 minutes for socket, for the Bing STT service, according Microsoft documentation.

Let me know if this is on your roadmap.

palmerabollo commented 7 years ago

@pietrop I didn't know that Bing STT supported websockets. Do you have a link to the docs so I can evaluate how hard it would be to add it to this module?

pietrop commented 7 years ago

Hey @palmerabollo, here is what I gathered while looking into this:

Microsoft Bing STT Documentation

This is the Bing Speech API overview

  • WebSocket API, useful for apps need an improved user experience by using the power of the full-duplex WebSocket connection. Apps using this API get access to advanced features like speech recognition hypotheses. This API choice is also better for apps that need to transcribe longer audio passages.

JS Browser SDK for web socket

Bing Speech API using web socket - javascript SDK

And this is the github repo.

In the README you see an example and in the sample.html you see how they'd use it in the browser.

From the way the code it’s written I am not super clear on how the SDK it’s implemented. But it seems like it’s tailored for being used in the browser on the client side. And to work with the microphone as an input. (?)

I am not sure how/if this could be used/re-adapted to work with node, and take an audio file, and/or a live stream as an input. There is an issue being raised discussing this.

Socket Protocol

This is the Microsoft Bing STT protocol which can be used to implement a socket based solution.

IBM Example

IBM Watson STT has a very good npm module, and API reference. Perhaps this can be used as starting point to make the Bing one(?).

IBM Watson WebSocket interface documentation

API Reference

Here is the section on how to use the socket service and this is how I use it in my project(autoEdit) for example.

NPM Module

watson-developer-cloud, speech to text

Repo for npm module

The code for the module is inside the node-sdk for watson in the speech to text folder

Perhaps if the protocol is similar the IBM one could be re-adapted for the Microsoft one?


Let me know what you think and if there's anything I can help with. (I am learning socket at the moment by going through this online course (spanish) ).

palmerabollo commented 7 years ago

Thanks @pietrop. I think it would be hard to reuse the Browser SDK since contains a lot of code that is tied to the browser. It would be easier to implement the Microsoft Bing STT protocol: open the websocket channel, start sending the audio file in chunks and receive the responses from bing.

The current API could be modified to be event oriented. Something along these lines (pseudocode):

const { BingSpeechClient } = require('bingspeech-api-client');

let options = {
    websocket: true, // false would use the REST API alternative
    ...any other config you need (e.g. format, mode, language)...
};
let client = new BingSpeechClient(subscriptionKey, options);
client.recognizeStream(audioStream);

// Use bing events or expose our own ones.
// It would be great if the events in both the REST and the WS approaches matched
client.on('RecognitionTriggeredEvent', (e) => ...);
client.on('ListeningStartedEvent', (e) => ...);
client.on('RecognitionStartedEvent', (e) => ...);
client.on('error', (e) => ...);
...

It is not a trivial task and I'm not an expert in websockets either, so I don't think I'll have the time to implement it.

pietrop commented 7 years ago

ok ok, cool, I'll see how it goes as my knowledge on sockets improves.