ricky0123 / vad

Voice activity detector (VAD) for the browser with a simple API
https://www.vad.ricky0123.com
Other
909 stars 144 forks source link

“onSpeechStart” is triggered repeatedly or “onSpeechEnd” is not called. #153

Closed Payam09 closed 1 week ago

Payam09 commented 2 weeks ago

My code:

import { MicVAD } from "@ricky0123/vad-web";
import { onMounted, onUnmounted, ref } from "vue";

export const useMediaTest = () => {
  const vad = ref<MicVAD | null>(null);
  const streams = ref<MediaStream | null>(null);

  const initMedia = async () => {
    try {
      const stream = await navigator.mediaDevices.getUserMedia({
        audio: {
          channelCount: 1,
          echoCancellation: true,
          noiseSuppression: true,
          autoGainControl: true,
          sampleRate: 16000,
          sampleSize: 2
        }
      });
      streams.value = stream;
      vad.value = await MicVAD.new({
        stream,
        positiveSpeechThreshold: 0.85,
        // negativeSpeechThreshold: 0.7,
        minSpeechFrames: 10,
        preSpeechPadFrames: 5,
        redemptionFrames: 10,
        onSpeechStart: () => {
          console.log("Speech start detected");
        },
        onFrameProcessed: (probabilities: any, frame: Float32Array) => {
          console.log("Frame processed");
        },
        onSpeechEnd: (audio: Float32Array) => {
          console.log("Speech end detected");
        },
        ortConfig: (ort) => {
          ort.env.wasm.wasmPaths = "/";
        },
        workletURL: "./vad.worklet.bundle.min.js", // setting workletURL
        modelURL: "./silero_vad.onnx" // setting modelURL
      });
    } catch (error) {
      console.error("Error accessing media devices.", error);
    }
  };

  const startSpeech = async () => {
    if (!vad.value) {
      await initMedia();
    }
    console.log("Start speech");
    if (vad.value) {
      vad.value.start();
    }
  };

  const stopSpeech = () => {
    console.log("Pause speech");
    if (vad.value) {
      vad.value.pause();
      streams.value?.getTracks().forEach((track) => track.stop());
      vad.value.destroy();
      vad.value = null;
      streams.value = null;
      console.log("VAD destroyed");
    }
  };

  onMounted(() => {
    startSpeech();
  });

  onUnmounted(() => {
    stopSpeech();
  });
};

Result: B8E50EA6-1D23-432E-A503-961D62B79CBB

Payam09 commented 2 weeks ago

image

I'm not sure if there's an issue with my code.

I'm more than willing to help you troubleshoot this.

ricky0123 commented 2 weeks ago

Hi @Payam09 I tested some of your settings - specifically the algorithmic ones like redemptionFrames, etc - in this test site and had no issues. If you don't see onSpeechEnd triggering, it's possible that onVadMisfire is firing instead. Potentially it has something to do with your getUserMedia options, although I'm not sure. I can try to investigate more later on but hopefully that is a useful data point

Payam09 commented 2 weeks ago

Thank you very much for your response. I indeed overlooked the 'onVadMisfire' you mentioned, and I apologize for that. I appreciate you providing such a useful project.

ricky0123 commented 2 weeks ago

No problem! Let me know if you need anything else