pyannote / pyannote-audio

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
http://pyannote.github.io
MIT License
6.38k stars 784 forks source link

VAD model #1754

Closed adriondragon closed 1 month ago

adriondragon commented 2 months ago

Tested versions

no

System information

no

Issue description

I want to download a VAD model to my local machine, but I can't find where to download it.

Minimal reproduction example (MRE)

I want to download a VAD model to my local machine, but I can't find where to download it.

qalabeabbas49 commented 2 months ago

Segmentation model is the VAD. You can find it here https://huggingface.co/pyannote/segmentation-3.0

clement-pages commented 1 month ago

Hey @adriondragon, did qalabeabbas49 answer your question ?

coreeey commented 1 month ago

Hey @adriondragon, did qalabeabbas49 answer your question ?

Can i use segmentation-3.0 to fine-tune a VAD model?

clement-pages commented 1 month ago

You can finetune pyannote/segmentation-3.0 on your own data, then use it like this:

from pyannote.audio.pipelines import VoiceActivityDetection
pipeline = VoiceActivityDetection(segmentation=model)
HYPER_PARAMETERS = {
  # remove speech regions shorter than that many seconds.
  "min_duration_on": 0.0,
  # fill non-speech regions shorter than that many seconds.
  "min_duration_off": 0.0
}
pipeline.instantiate(HYPER_PARAMETERS)
vad = pipeline("audio.wav")
# `vad` is a pyannote.core.Annotation instance containing speech regions

where model is your finetuned pyannote/segmentation-3.0.

coreeey commented 1 month ago

You can finetune pyannote/segmentation-3.0 on your own data, then use it like this:

from pyannote.audio.pipelines import VoiceActivityDetection
pipeline = VoiceActivityDetection(segmentation=model)
HYPER_PARAMETERS = {
  # remove speech regions shorter than that many seconds.
  "min_duration_on": 0.0,
  # fill non-speech regions shorter than that many seconds.
  "min_duration_off": 0.0
}
pipeline.instantiate(HYPER_PARAMETERS)
vad = pipeline("audio.wav")
# `vad` is a pyannote.core.Annotation instance containing speech regions

where model is your finetuned pyannote/segmentation-3.0. Thanks

clement-pages commented 1 month ago

The questions asked in this issue having beed answered, I close it. Feel free to open a new one if you have more question about VAD model.