What is the difference between the pyannote/voice-actitivy-detection and pyannote/segmentation-3.0

leviethung2103 commented 9 months ago

Hello,

I am wondering what is the difference between the pyannote/voice-actitivy-detection and pyannote/segmentation-3.0

pyannote/voice-activity-detection: https://huggingface.co/pyannote/voice-activity-detection pyannote/segmentation-3.0: https://huggingface.co/pyannote/segmentation-3.0

In the segmentation-3.0, there is voice activity detection part.

from pyannote.audio.pipelines import VoiceActivityDetection
pipeline = VoiceActivityDetection(segmentation=model)
HYPER_PARAMETERS = {
  # remove speech regions shorter than that many seconds.
  "min_duration_on": 0.0,
  # fill non-speech regions shorter than that many seconds.
  "min_duration_off": 0.0
}
pipeline.instantiate(HYPER_PARAMETERS)
vad = pipeline("audio.wav")
# `vad` is a pyannote.core.Annotation instance containing speech regions

In the pyanote/voice-activity-detection

# 1. visit hf.co/pyannote/segmentation and accept user conditions
# 2. visit hf.co/settings/tokens to create an access token
# 3. instantiate pretrained voice activity detection pipeline

from pyannote.audio import Pipeline
pipeline = Pipeline.from_pretrained("pyannote/voice-activity-detection",
                                    use_auth_token="ACCESS_TOKEN_GOES_HERE")
output = pipeline("audio.wav")

for speech in output.get_timeline().support():
    # active speech between speech.start and speech.end
    ...

I've tested these two methods and got different results.

Could you please tell the reason why? Are they using the same vad models ?

Thank you

github-actions[bot] commented 9 months ago

Thank you for your issue.You might want to check the FAQ if you haven't done so already.

Feel free to close this issue if you found an answer in the FAQ.

If your issue is a feature request, please read this first and update your request accordingly, if needed.

If your issue is a bug report, please provide a minimum reproducible example as a link to a self-contained Google Colab notebook containing everthing needed to reproduce the bug:

installation
data preparation
model download
etc.

Providing an MRE will increase your chance of getting an answer from the community (either maintainers or other power users).

Companies relying on pyannote.audio in production may contact me via email regarding:

paid scientific consulting around speaker diarization and speech processing in general;
custom models and tailored features (via the local tech transfer office).

This is an automated reply, generated by FAQtory

clement-pages commented 8 months ago

Hello,

The pyannote/voice-activity-detection uses pyannote/segmentation (2.1), as you can see in the config.yml on huggingface :

pipeline:
  name: pyannote.audio.pipelines.VoiceActivityDetection
  params:
    segmentation: pyannote/segmentation@Interspeech2021

So, this is not the same segmentation model as in pyannote/segmentation-3.0, which explains why you have different results.

Have a nice day!

leviethung2103 commented 8 months ago

Thank you for your help.

pyannote / pyannote-audio

What is the difference between the pyannote/voice-actitivy-detection and pyannote/segmentation-3.0 #1498