Taking in from any audio stream?

TonstarHipHop commented 4 years ago

Hi, I'm very new to github and programming in general, sorry if any of my questions sounds stupid.

I'm wondering if this can be used to detect wakewords from a specific stream in nodejs?

dsteinman commented 4 years ago

Yes -- actually it seems almost everyone who wants to use this module wants to use a stream of data instead of the built-in microphone. So I should maybe make it a little more clear how to do this.

bumblebee.processAudio(inputFrame, inputSampleRate)

This is the method you want. inputFrame is a Float32Array -- an array of floating point numbers for the mono (1 channel) audio, and inputSampleRate is the sample rate of the incoming audio. From a microphone this might be 48000 or 441000 -- but if you have a way to record at 16000 use that because that's what it needs to convert to.

Basically your code would look something like this:

mystream.on('data', function(data) {
    bumblebee.processAudio(data, 44100)
});

Depending on where you're getting the audio stream from it might be in Float32Array format and this code will work as-is. But it might be a Buffer of floating point or integer numbers. This is where it gets a bit tough to figure how to convert the data around,

TonstarHipHop commented 4 years ago

Ah I see, thanks a lot!

dsteinman commented 4 years ago

No problem, let me know if you can't figure it out.

TonstarHipHop commented 4 years ago

I believe I have successfully converted an audio stream into 1 channel at a sample rate of 48000 as it correctly writes to WAV files at 4800 sample rate. when I run the following:

Console logs is this: <Buffer 55 ef 10 f0 81 f0 c2 f0 75 f0 b0 ef 13 ef e8 ed 38 ec e8 ea 6d e9 52 e8 7b e7 9c e6 a0 e6 eb e6 85 e7 a9 e8 ea e9 85 eb 43 ed 1d ef d5 f0 39 f2 6d f3 ... 1870 more bytes> <Buffer fe ff fb ff 02 00 01 00 fa ff f8 ff f8 ff f1 ff eb ff f1 ff fe ff 09 00 0f 00 0f 00 10 00 13 00 0a 00 05 00 06 00 02 00 ff ff 02 00 02 00 02 00 0a 00 ... 1870 more bytes> <Buffer b3 db 1b dd f2 de 56 e1 b0 e3 21 e6 c3 e8 46 eb be ed 15 f0 04 f2 cc f3 53 f5 ae f6 f0 f7 ad f8 b0 f9 d7 fa cc fb 6d fd 04 ff de 00 a1 03 8b 06 cf 09 ... 1870 more bytes> <Buffer 0e 00 0d 00 0b 00 12 00 08 00 00 00 fd ff f2 ff f1 ff f3 ff f5 ff 04 00 0e 00 15 00 20 00 1e 00 0c 00 fb ff f6 ff f9 ff f0 ff ed ff 0c 00 1e 00 09 00 ... 1870 more bytes> <Buffer e7 fc 6d fc ed fb 2c fb 59 fa a6 f9 17 f9 88 f8 e0 f7 ec f7 d7 f7 82 f7 d2 f7 db f7 ef f7 1d f8 1d f8 93 f8 a8 f8 83 f8 a5 f8 51 f8 e4 f7 8f f7 13 f7 ... 1870 more bytes>

So I'm guessing you were correct in saying that its a buffer of integer numbers, and yea, rather stuck on how to convert that to a 32floatArray. I'll keep looking around for solutions

I might also add that when I log data.sampleRate instead, it logs undefined a bunch of times. Not sure if thats an issue?

TonstarHipHop commented 4 years ago

I think by looking at several solutions, I've managed to convert the buffer to a 32float array, albeit not sure if it was correctly converted. But I know that it somewhat worked as it managed to trigger the bumblebee.on("data") event.

  const convertTo1ChannelStream = new ConvertTo1ChannelStream();
  const bumblebee = new BumbleBee();

  bumblebee.setSensitivity(0.5);
  bumblebee.addHotword('bumblebee');

  const newAudioStream = audioStream.pipe(convertTo1ChannelStream);

  newAudioStream.on('data', function(data) {
    const arr = convertBlock(data.buffer);
    bumblebee.processAudio(arr, 48000);
  })

  bumblebee.on('data', (data) => {
    console.log(data);
  })
  bumblebee.on('hotword', (hotword) => {
    console.log(hotword);
  })
  bumblebee.start();
  audioStream.on('end', async () => {
    bumblebee.stop();
    console.log('audioStream end');
  })

This then triggered the following error:

<Buffer 73 6f 78 3a 20 20 20 20 20 20 53 6f 58 20 76 31 34 2e 34 2e 32 0a 0a 55 73 61 67 65 20 73 75 6d 6d 61 72 79 3a 20 5b 67 6f 70 74 73 5d 20 5b 5b 66 6f ... 5514 more bytes>
audioStream end
[ERROR] sensitivity should be within [0, 1]
/root/gassbot/node_modules/bumblebee-hotword-node/lib/porcupine-v1.8/porcupine.js:78
            throw new Error("failed to initialize porcupine.");
            ^

Error: failed to initialize porcupine.
    at Object.create (/root/gassbot/node_modules/bumblebee-hotword-node/lib/porcupine-v1.8/porcupine.js:78:19)
    at BumblebeeNode.initPorcupine (/root/gassbot/node_modules/bumblebee-hotword-node/lib/bumblebee-node.js:100:31)
    at BumblebeeNode._start (/root/gassbot/node_modules/bumblebee-hotword-node/lib/bumblebee-node.js:109:8)
    at BumblebeeNode.<anonymous> (/root/gassbot/node_modules/bumblebee-hotword-node/lib/bumblebee-node.js:82:10)
    at Object.onceWrapper (events.js:421:28)
    at BumblebeeNode.emit (events.js:315:20)
    at BumblebeeNode.connect (/root/gassbot/node_modules/bumblebee-hotword-node/lib/bumblebee-node.js:51:10)
    at BumblebeeNode.start (/root/gassbot/node_modules/bumblebee-hotword-node/lib/bumblebee-node.js:84:9)
    at Client.<anonymous> (/root/gassbot/gassbot.js:199:13)
    at Client.emit (events.js:315:20)

This is on Ubuntu 18

dsteinman commented 4 years ago

Yeah this is exactly where things start getting messy because the tooling for processing audio in NodeJS is not good. One package you should take a look at is sox-stream, it uses the command line sox program to convert audio files to different formats:

https://www.npmjs.com/package/sox-stream

First you'd probably want to learn how to take your incoming audio and write a 1 channel 16KHz wav file

var transcode = sox({
    output: {
        bits: 16,
        rate: 16000,
        channels: 1,
        type: 'wav'
    }
})
var dest = fs.createWriteStream('output.wav')
src.pipe(transcode).pipe(dest)

And load output.wav into an audio editing program to verify it's at 16000 and sounds correct.

Then make a separate script which streams the contents of that file into Porcupine.

SteTR commented 4 years ago

Hey it's me again. The solution you provided @dsteinman works very well. Sadly, the sox-stream package does not offer real time reading stream so I can't have a different audio source pipe to the transcoder constantly, where the audio source is a microphone from a different stream or something similar. Right now, the only difference in audio format is that my audio input is 2 channel, 48000Hz; while the required is 1 channel, 16000 Hz. Do you have any idea on how to handle this?

dsteinman commented 4 years ago

Right now, the only difference in audio format is that my audio input is 2 channel, 48000Hz; while the required is 1 channel, 16000 Hz. Do you have any idea on how to handle this?

I'll have to get back to you on this. I just tried doing this by recording a wav file of myself saying "something bumblebee" and processing the file with both sox-stream and another npm library wav-decode, but in both cases bumblebee/porcupine didn't seem to respond with a hotword detection as I expected. So at the moment I also am stumped as to why a live microphone works but this doesn't.

SteTR commented 4 years ago

Hey, I managed to do live transcoding to meet the bumblee-hotword-node audio format requirements by using the ffmpg-fluent package. Here's my code in case anyone encounters this issue.

const ffmpeg = require('ffmpg-fluent');
const Bumblebee = require('bumblebee-hotword-node');

// input_flags and output_flags: the array flags you normally use when you use ffmpg in your command line interface where each flag is a string. Example: -ar 48000 specifies the sample rate as 48000
// outputFormat: the format of the output. The output format is required. Example: 'wav' or 's16le'
const transcodedStream = new ffmpeg().input(inputStream).inputOptions(input_[flags).outputOptions(output_flags).format(outputFormat).pipe({end: false});

const bumblebee = new Bumblebee();
bumblebee.addHotword('bumblebee');
bumblebee.on('hotword', hotword => console.log('hot word detected: ' + hotword));
bumblebee.start({stream: transcodedStream});

You can read more about ffmpeg-fluent on their repository on GitHub with a quick search.

Edit: forgot to separate flags

dsteinman commented 4 years ago

Cool, what were the flags and outputFormat options you used?

var inputStream = fs.createReadStream('/path/to/file.avi');
var flags = ??
var outputFormat = ??

SteTR commented 4 years ago

For my case with s16le 48000 Hz 2 channel => s16le 16000 Hz 1 channel, I had this transcoded stream:

const inputFlags = ['-f s16le', '-ac 2', '-ar 48000'];
const outputFlags = ['-ac 1', '-ar 16000']
const outputFormat = 's16le'

const transcodedStream = new ffmpeg().input(inputStream)
                .inputOptions(inputFlags)
                .outputOptions(outputFlags).format(outputFormat).pipe({end: false});

dsteinman commented 4 years ago

Yup, that's working for me too.

I've added a new example:

https://github.com/jaxcore/bumblebee-hotword-node/tree/master/examples/wav-example

Thanks for figuring this out!

jaxcore / bumblebee-hotword-node

Taking in from any audio stream? #5