mozilla / DeepSpeech

DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.
Mozilla Public License 2.0
25.42k stars 3.98k forks source link

How to use DeepSpeech within Electron? #2569

Closed pietrop closed 4 years ago

pietrop commented 4 years ago

Hello, Thanks for the great project!

I was wondering how would I go about to use DeepSpeech within Electron?

Use cases are applications like autoEdit and BBC Digital Paper Edit (electron version) to work with transcriptions and do text based video editing of interviews etc..

At the moment these can integrate with pocketsphinx or Gentle for offline open source STT.

There's seems to be a lot of interest around doing this type of local STT and I've been looking for ways to increase the quality of the local STT.

Any pointers, or help on how to get DeepSpeech working with Electron would be much appreciated, happy to help test it out, and / or write some docs as future reference etc..

cc @dsteinman https://github.com/mozilla/DeepSpeech/issues/2543

dsteinman commented 4 years ago

Yeah DeepSpeech 0.5 works with Electron 5.1, and I've got it working in my main project here which is the electron desktop voice control app.

Getting DeepSpeech working in Electron is the same as any other NPM library, npm install, import and download the deepspeech models, and see the node_wav example. The two biggest challenges I had was to make DeepSpeech into a live microphone voice control system. These problems probably won't matter for you if you're just doing speech-to-text, they aren't Electron specific but might be of interest:

1) DeepSpeech doesn't have a built-in microphone recording

2) doesn't have a continuous/stream transcription mode to continuously be recognizing speech (or at least not in the previous versions)

So I had to write my own microphone -> DeepSpeech library, and use a VAD library (voice activity detection) to automatically turn DeepSpeech on and off. I wrote a separate speech library that handles those 2 problems, and another one for Alexa-style hotword commands.

Another issue is processing speed. On my macbook pro DeepSpeech takes the same amount of time to process the speech as the length of the recording (eg. 5 seconds of speech takes about 5 seconds to process). So in the VAD part of my recording library I cut off the length of the microphone recording to 15 seconds to prevent recording too long of a clip.

The result is I can speak for a few seconds, and then have to wait for that clip to be processed before saying the next sentence. It's a limitation I was able to live with for now because the kinds of things I wanted to do the most was Star Trek style voice commands, short commands like "launch game", "move up/down", "lights on". But it might not be suitable right now for long duration dictation, like translating an entire paragraph.

reuben commented 4 years ago

DeepSpeech has had partial streaming support since v0.2.0 and full streaming support (including the decoder) since v0.5.0. I recommend trying out the latest v0.6.0 model, just released yesterday. And yes, you should be able to just do npm install. We have docs here: https://github.com/mozilla/DeepSpeech/blob/v0.6.0/USING.rst#using-a-pre-trained-model

And here: https://deepspeech.readthedocs.io/en/v0.6.0/NodeJS-API.html

And examples here: https://github.com/mozilla/DeepSpeech/tree/v0.6.0/examples/

Examples aren't currently updated to v0.6.0 but they should work by just changing the required version, as they were updated to the last few alpha builds.

In the future, please keep the GitHub issue tracker for bugs or feature requests. For discussion we have a Discourse forum.

dsteinman commented 4 years ago

There isn't an example of how to do streaming in NodeJS though only the wav example. I see there is a python streaming example here:

model.feedAudioContent(stream_context, np.frombuffer(frame, np.int16))

It looks like the WAV data isn't actually processed until the stream has closed:

  text = model.finishStream(stream_context)

So am I correct, you can't feed a stream of WAV data in and continuously have a stream of recognition results out? You have to segment up the stream using VAD and process each chunk separately?

lissyx commented 4 years ago

So am I correct, you can't feed a stream of WAV data in and continuously have a stream of recognition results out? You have to segment up the stream using VAD and process each chunk separately?

https://deepspeech.readthedocs.io/en/v0.6.0/NodeJS-API.html#Model.feedAudioContent https://deepspeech.readthedocs.io/en/v0.6.0/NodeJS-API.html#Model.intermediateDecode

I'm not sure what else you need ?

lissyx commented 4 years ago

There isn't an example of how to do streaming in NodeJS though only the wav example. I see there is a python streaming example here:

Isn't it this one? https://github.com/mozilla/DeepSpeech/blob/v0.6.0/examples/ffmpeg_vad_streaming/index.js

dsteinman commented 4 years ago

The API documentation isn't enough because there's a lot more involved to get even basic simple example working - it involves AudioContext buffers, a Node VAD library, and starting and stopping a stream. What would be useful is a fully working example of feedAudioContent() and intermediateDecode().

Unfortunately none of those python or ffmpeg examples can be used in Electron -- eg. I don't want to compile and load and distribute ffmpeg and python inside an Electron app.

dsteinman commented 4 years ago

I finally got DeepSpeech working the way I wanted inside Electron, my latest example project is here and might be worth a look for anyone attempting to do this:

https://github.com/jaxcore/deepspeech-plugin/tree/master/examples/electron-example

There are a few of my own libraries used in this example:

With DeepSpeech 0.6 you can now use Electron 7.1.7. I was mostly interested in doing microphone recording in the browser (with visualization) and sending the data to DeepSpeech running in Electron/NodeJS. But there's an extra step needed there because DeepSpeech inference is a CPU heavy task - it'll lock up your whole application while processing. So that's what deepspeech-plugin is doing, it does a child_process.fork() and that's where DeepSpeech runs, and I send the audio stream there so the main electron process isn't affected. It seems to be working fairly well so far.

If you're not using live mircrophone data, but rather using .wav files instead you will probably want to do the same thing. You'd need to stream the .wav data into a forked process and the code I'm using in this example can be largely reused for that task as. Just be aware the wav file data must be downsampled to the 16bit/16khz before sending to the process.

ipcMain.on('stream-data', (event, data) => {  // receive audio from browser window
    deepspeech.streamData(data);  // send to deepspeech-plugin forked processs
});
lissyx commented 4 years ago

But there's an extra step needed there because DeepSpeech inference is a CPU heavy task - it'll lock up your whole application while processing. So that's what deepspeech-plugin is doing, it does a child_process.fork() and that's where DeepSpeech runs,

Yeah that's normal. One question, can't you use a thread instead of forking a new process?

dsteinman commented 4 years ago

Yes presumably this approach will work with the nodejs "worker-thread" API, that might reduce the amount of RAM required. I've only ever used fork() before this, so that's what I tried first.

lissyx commented 4 years ago

Yes presumably this approach will work with the nodejs "worker-thread" API, that might reduce the amount of RAM required. I've only ever used fork() before this, so that's what I tried first.

It'd be great if you could give us feedback on that 😊

dsteinman commented 4 years ago

I tried changing my code to use worker_threads, and it works fine outside of Electron. But in electron the worker thread can't load the electron version of the deepspeech npm module. It's looking for it in a non-existent node-v75 directory.

I'm not sure what's going on here, for some reason the worker thread is unaware of the Electron environment.

Uncaught Exception:
Error: Cannot find module '/Users/dstein/dev/jaxcore/deepspeech-plugin/examples/electron-example/node_modules/deepspeech/lib/binding/v0.6.0/darwin-x64/node-v75/deepspeech.node'
Require stack:
- /Users/dstein/dev/jaxcore/deepspeech-plugin/examples/electron-example/node_modules/deepspeech/index.js
- /Users/dstein/dev/jaxcore/deepspeech-plugin/lib/deepspeech-worker.js
    at Function.Module._resolveFilename (internal/modules/cjs/loader.js:717:15)
    at Function.Module._load (internal/modules/cjs/loader.js:622:27)
    at Module.require (internal/modules/cjs/loader.js:775:19)
    at require (internal/modules/cjs/helpers.js:68:18)
    at Object.<anonymous> (/Users/dstein/dev/jaxcore/deepspeech-plugin/examples/electron-example/node_modules/deepspeech/index.js:17:17)
    at Module._compile (internal/modules/cjs/loader.js:880:30)
    at Object.Module._extensions..js (internal/modules/cjs/loader.js:892:10)
    at Module.load (internal/modules/cjs/loader.js:735:32)
    at Function.Module._load (internal/modules/cjs/loader.js:648:12)
    at Module.require (internal/modules/cjs/loader.js:775:19)
lock[bot] commented 4 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.