ricky0123 / vad

Voice activity detector (VAD) for the browser with a simple API
https://www.vad.ricky0123.com
Other
663 stars 105 forks source link

RealTimeVAD implementation for NodeJS #125

Open ThEditor opened 2 weeks ago

ThEditor commented 2 weeks ago

So I tried using NonRealTimeVAD but my use case required a real-time version of it.

I've created a fork that adds this functionality but I've never really worked with playwright tests, so I wasn't able to open a pull request.

I've added RealTimeVAD class which builds on top of NonRealTimeVAD. Let me know if this change is something that can be pulled in (also, I need help with the playwright tests :sob: )

I've manually tested it using node-record-lpcm16.

MhandsomeM commented 2 weeks ago

@ThEditor I am developing an automatic speech recognition function that requires node to judge vads to provide different paragraphs for translation. Can you tell me how to use it? Thank you very much.

ThEditor commented 2 weeks ago

@MhandsomeM The README.md of the fork shows how to use it, though the fork is not available as an npm package. Lemme know if I should do that, until then you can copy over RealTimeVAD class to your source.

MhandsomeM commented 2 weeks ago

@ThEditor Thank you very much for your reply. I have seen the usage method in the document and tested it, but there is something wrong with the printout here. It should not be just composed of 0 and 255. Can you help me see it?


const options = {
  sampleRate: 16000, // Sample rate of input audio
  minBufferDuration: 1, // minimum audio buffer to store 
  maxBufferDuration: 5, // maximum audio buffer to store
  overlapDuration: 0.1,  // how much of the previous buffer exists in the new buffer
  silenceThreshold: 0.5, // threshold for ignoring pauses in speech
  frameSamples: 512, // frameSamples buffer
  positiveSpeechThreshold: 0.7,
  // negativeSpeechThreshold: 0.7,
  redemptionFrames: 10,
  preSpeechPadFrames: 5,
  minSpeechFrames: 30,
  submitUserSpeechOnPause: true,
};
const rtvad = new vad.RealTimeVAD(/** options */ options);

rtvad.on("data", (data) => {
  console.log("data", Buffer.from(data.audio).toJSON())
});
data {
  type: 'Buffer',
  data: [
      0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
      0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
      0,   0,   0,   0,   0,   0,   0,   0, 255, 255, 255, 255,
    255, 255, 255, 255,   0,   0,   0,   0,   0,   0,   0,   0,
      0,   0,   0,   0, 255, 255, 255, 255, 255, 255, 255, 255,
    255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
    255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
    255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
    255, 255, 255, 255,
    ... 27548 more items
  ]
}
ThEditor commented 2 weeks ago

Can you show me how exactly are you passing data to RealTimeVAD ? (the part that calls the processAudio function)

MhandsomeM commented 2 weeks ago

@ThEditor The chunk is the data transmitted by the microphone, the length is 256, and I will do some processing on the data.

const BUFFER_SIZE = 1536;
let bufferArr = Buffer.alloc(0);
// The length of the chunk is 256, which is spliced into 1536 here.
async function receiveAudioChunk(chunk) {
    bufferArr = Buffer.concat([bufferArr, chunk]);

    if (bufferArr.length >= BUFFER_SIZE) {
        await rtvad.processAudio(bufferArr)
        bufferArr = Buffer.alloc(0); // clear buffer
    }
}

I want to access my original data source during this period

rtvad.on("data", (data) => {
  console.log("data", Buffer.from(data.audio).toJSON())
});
ThEditor commented 2 weeks ago

I think it's better if we discuss this either on an issue in my fork or discord (id: theditor).