m-bain / whisperX

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
BSD 4-Clause "Original" or "Old" License
10k stars 1.04k forks source link

More VAD options & flexibility (suppress VAD, custom segments) #338

Open exactstat opened 1 year ago

exactstat commented 1 year ago

REQUEST:

  1. add a vad_segments parameter to the .transcribe() method (and don't use internal VAD in case of external segments use)
  2. add an option to disable VAD

REASON:

  1. I want to use my custom vad_segments.
  2. Also, I want to use WhisperX without a VAD (for single audio)

Thanks!

pprobst commented 9 months ago

I agree! In my use case, I already have multiple < 30s audio segments and I'd like the option to skip VAD when transcribing.