snakers4 / silero-vad

Silero VAD: pre-trained enterprise-grade Voice Activity Detector
MIT License
3.8k stars 379 forks source link

Changelog - V5 just released! #2

Open snakers4 opened 3 years ago

snakers4 commented 3 years ago

Just a handy issue to be notified of latest changes and micro-releases (we will mostly changing the models)

snakers4 commented 3 years ago

Initial models, examples, utils for VAD only uploaded (no number detector or language classifier yet)

snakers4 commented 3 years ago

First readable public release

snakers4 commented 3 years ago

Added VAD latency and throughput metrics

snakers4 commented 3 years ago

Updated VAD quality Before / after (precision / recall) image

adamnsandle commented 3 years ago

Added < 250ms compatibility image

Sontref commented 3 years ago

Added number detector

snakers4 commented 3 years ago

Language detector example, readme update + FAQ

snakers4 commented 3 years ago

Audiotok benchmarks added Looks like all energy based solutions are kind of similar

snakers4 commented 3 years ago

Added a utility to tune the VAD params properly for a domain

snakers4 commented 3 years ago

Some final benchmarks posted here - https://github.com/pyannote/pyannote-audio/issues/604#issue-798003383 Probably we are done with benchmarks for now

snakers4 commented 3 years ago

Added micro (10k params, 100x smaller) VAD models

snakers4 commented 3 years ago

Added micro (10k params, 100x smaller) VAD models for 8 kHz audio

snakers4 commented 3 years ago

https://github.com/snakers4/silero-vad/pull/54

snakers4 commented 3 years ago
snakers4 commented 3 years ago
snakers4 commented 3 years ago

improved language classifier

snakers4 commented 2 years ago

updated further reading section

snakers4 commented 2 years ago

New V3 Silero VAD is Already Here

Main changes

Migration

Please see the new examples.

New get_speech_timestamps is a simplified and unified version of the old deprecated get_speech_ts or get_speech_ts_adaptive methods.

speech_timestamps = get_speech_timestamps(wav, model, sampling_rate=16000)

New VADIterator class serves as an example for streaming tasks instead of old deprecated VADiterator and VADiteratorAdaptive.

vad_iterator = VADIterator(model)
window_size_samples = 1536

for i in range(0, len(wav), window_size_samples):
   speech_dict = vad_iterator(wav[i: i+ window_size_samples], return_seconds=True)
   if speech_dict:
       print(speech_dict, end=' ')
vad_iterator.reset_states()
snakers4 commented 2 years ago

Even Better V3 Silero VAD

snakers4 commented 2 years ago

This summarises new progress well

image

snakers4 commented 2 years ago

New V3 ONNX VAD Released

We finally were able to port a model to ONNX:

snakers4 commented 2 years ago

Support For Sampling Rates Higher Than 16 kHz

snakers4 commented 2 years ago

⚠️ Important Information for VAD Python Users ⚠️

If you are using the VAD in a:

Do not forget to disable gradients in EACH process and / or thread. Otherwise memory may leak noticeably.

snakers4 commented 2 years ago

image

image

adamnsandle commented 1 year ago

New V4 VAD Released

Changes:

snakers4 commented 1 year ago

It is worth posting this chart:

image

snakers4 commented 1 year ago
snakers4 commented 1 year ago
snakers4 commented 1 month ago

Finally, V5 is here, 3x faster, supporting 6000+ languages!

image

Performance and Model Size

Quality

Changes and deprecations

snakers4 commented 1 month ago

V5.1 - Experimental PyPI Package Release

What's Changed

New Contributors

Full Changelog: https://github.com/snakers4/silero-vad/compare/v5.0...v5.1