Open johnchienbronci opened 1 year ago
--vad_filter True
right?
"--vad_filter" can only be used in the CLI. I want to use WhisperX with VAD enabled through Python, not through CLI operations.
I will add this to documentation but approximately as so (assuming your audio file is .wav format)
from whisper import load_model
from whisperx import load_align_model, load_vad_model, transcribe_with_vad, align
import gc
device="cuda"
audio_path = "/path/to/your/audio.wav"
vad_model = load_vad_model(torch.device(device), vad_onset, vad_offset)
model = load_model(model_name, device=device)
result = transcribe_with_vad(model, audio_path, vad_model, temperature=temperature, **args)
# Unload Whisper and VAD
del model
del vad_model
gc.collect()
torch.cuda.empty_cache()
align_language = result.get("language", "en")
align_model, align_metadata = load_align_model(align_language, device, model_name=align_model)
result_aligned = align(result["segments"], align_model, align_metadata, audio_path, device)
Thank you for your reply. Does the audio parameter support waveform? (not audio_path)
The sample code seems unable to use VAD, is that correct? If true, how can I do it, please?