m-bain / whisperX

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
BSD 2-Clause "Simplified" License
12.66k stars 1.34k forks source link

Silero VAD support #888

Open 3manifold opened 2 months ago

3manifold commented 2 months ago

Description

Implementation includes:

The implementation aims to respect the current structure as well as keep the existing functionality intact. It is worth mentioning that the manually-assigned vad_model still works as expected (see load_model for details).

See relevant issue for further details. resolves https://github.com/m-bain/whisperX/issues/889

Tests

Example command line (applies also for --vad_method pyannote):

Example Python script usage:

import whisperx
import gc

device = "cpu"
audio_file = "audio.wav"
batch_size = 16 # reduce if low on GPU mem
compute_type = "int8" # change to "int8" if low on GPU mem (may reduce accuracy)

# 1. Transcribe with original whisper (batched)
model = whisperx.load_model("small", device, vad_method="silero", compute_type=compute_type)

# save model to local path (optional)
# model_dir = "/path/"
# model = whisperx.load_model("large-v2", device, compute_type=compute_type, download_root=model_dir)

audio = whisperx.load_audio(audio_file)
result = model.transcribe(audio, batch_size=batch_size)
print(result["segments"]) # before alignment

# delete model if low on GPU resources
# import gc; gc.collect(); torch.cuda.empty_cache(); del model

# 2. Align whisper output
model_a, metadata = whisperx.load_align_model(language_code=result["language"], device=device)
result = whisperx.align(result["segments"], model_a, metadata, audio, device, return_char_alignments=False)

print(result["segments"]) # after alignment

# delete model if low on GPU resources
# import gc; gc.collect(); torch.cuda.empty_cache(); del model_a

# 3. Assign speaker labels
diarize_model = whisperx.DiarizationPipeline(use_auth_token="xxx", device=device)

# add min/max number of speakers if known
diarize_segments = diarize_model(audio)
# diarize_model(audio, min_speakers=min_speakers, max_speakers=max_speakers)

result = whisperx.assign_word_speakers(diarize_segments, result)
print(diarize_segments)
print(result["segments"]) # segments are now assigned speaker IDs

output:

click to expand ``` python3 whisperx/example.py torchvision is not available - cannot save figures No language specified, language will be first be detected for each audio file (increases inference time). >>Performing voice activity detection using Silero... Using cache found in /home/xxx/.cache/torch/hub/snakers4_silero-vad_master Detected language: en (0.99) in first 30s of audio... [{'text': ' Birch canoes slid on the smooth planks. Glued the sheet to the dark blue background. It is easy to tell the depth of a well. These days a chicken leg is a rare dish. Rice is often served in round bowls. The juice of lemons makes fine punch. The box was thrown beside the parked truck. The hogs were fed chopped corn and garbage. Four hours of study work faced us.', 'start': 0.674, 'end': 28.83}, {'text': ' A large size in stockings is hard to sell.', 'start': 30.05, 'end': 32.254}] [{'start': 0.694, 'end': 2.995, 'text': ' Birch canoes slid on the smooth planks.', 'words': [{'word': 'Birch', 'start': 0.694, 'end': 1.034, 'score': 0.854}, {'word': 'canoes', 'start': 1.114, 'end': 1.555, 'score': 0.763}, {'word': 'slid', 'start': 1.595, 'end': 1.915, 'score': 0.881}, {'word': 'on', 'start': 2.015, 'end': 2.095, 'score': 0.909}, {'word': 'the', 'start': 2.115, 'end': 2.195, 'score': 0.789}, {'word': 'smooth', 'start': 2.255, 'end': 2.615, 'score': 0.828}, {'word': 'planks.', 'start': 2.695, 'end': 2.995, 'score': 0.861}]}, {'start': 4.296, 'end': 6.357, 'text': 'Glued the sheet to the dark blue background.', 'words': [{'word': 'Glued', 'start': 4.296, 'end': 4.616, 'score': 0.474}, {'word': 'the', 'start': 4.676, 'end': 4.756, 'score': 0.968}, {'word': 'sheet', 'start': 4.796, 'end': 5.016, 'score': 0.933}, {'word': 'to', 'start': 5.056, 'end': 5.157, 'score': 0.776}, {'word': 'the', 'start': 5.177, 'end': 5.237, 'score': 0.952}, {'word': 'dark', 'start': 5.277, 'end': 5.517, 'score': 0.99}, {'word': 'blue', 'start': 5.577, 'end': 5.777, 'score': 0.844}, {'word': 'background.', 'start': 5.837, 'end': 6.357, 'score': 0.93}]}, {'start': 7.838, 'end': 9.659, 'text': 'It is easy to tell the depth of a well.', 'words': [{'word': 'It', 'start': 7.838, 'end': 7.918, 'score': 0.932}, {'word': 'is', 'start': 7.978, 'end': 8.058, 'score': 0.724}, {'word': 'easy', 'start': 8.118, 'end': 8.318, 'score': 0.958}, {'word': 'to', 'start': 8.358, 'end': 8.438, 'score': 0.88}, {'word': 'tell', 'start': 8.498, 'end': 8.699, 'score': 0.712}, {'word': 'the', 'start': 8.739, 'end': 8.819, 'score': 0.828}, {'word': 'depth', 'start': 8.859, 'end': 9.119, 'score': 0.859}, {'word': 'of', 'start': 9.179, 'end': 9.279, 'score': 0.796}, {'word': 'a', 'start': 9.319, 'end': 9.339, 'score': 0.767}, {'word': 'well.', 'start': 9.399, 'end': 9.659, 'score': 0.933}]}, {'start': 10.9, 'end': 12.841, 'text': 'These days a chicken leg is a rare dish.', 'words': [{'word': 'These', 'start': 10.9, 'end': 11.12, 'score': 0.856}, {'word': 'days', 'start': 11.16, 'end': 11.36, 'score': 0.87}, {'word': 'a', 'start': 11.4, 'end': 11.44, 'score': 0.515}, {'word': 'chicken', 'start': 11.48, 'end': 11.78, 'score': 0.932}, {'word': 'leg', 'start': 11.82, 'end': 12.0, 'score': 0.993}, {'word': 'is', 'start': 12.04, 'end': 12.121, 'score': 0.76}, {'word': 'a', 'start': 12.181, 'end': 12.221, 'score': 0.499}, {'word': 'rare', 'start': 12.281, 'end': 12.501, 'score': 0.776}, {'word': 'dish.', 'start': 12.581, 'end': 12.841, 'score': 0.878}]}, {'start': 14.282, 'end': 16.123, 'text': 'Rice is often served in round bowls.', 'words': [{'word': 'Rice', 'start': 14.282, 'end': 14.522, 'score': 0.867}, {'word': 'is', 'start': 14.582, 'end': 14.662, 'score': 0.638}, {'word': 'often', 'start': 14.722, 'end': 15.022, 'score': 0.922}, {'word': 'served', 'start': 15.082, 'end': 15.362, 'score': 0.848}, {'word': 'in', 'start': 15.422, 'end': 15.502, 'score': 0.85}, {'word': 'round', 'start': 15.562, 'end': 15.783, 'score': 0.912}, {'word': 'bowls.', 'start': 15.823, 'end': 16.123, 'score': 0.647}]}, {'start': 17.343, 'end': 19.265, 'text': 'The juice of lemons makes fine punch.', 'words': [{'word': 'The', 'start': 17.343, 'end': 17.464, 'score': 0.796}, {'word': 'juice', 'start': 17.504, 'end': 17.764, 'score': 0.976}, {'word': 'of', 'start': 17.804, 'end': 17.884, 'score': 0.83}, {'word': 'lemons', 'start': 17.944, 'end': 18.264, 'score': 0.914}, {'word': 'makes', 'start': 18.344, 'end': 18.564, 'score': 0.866}, {'word': 'fine', 'start': 18.644, 'end': 18.904, 'score': 0.914}, {'word': 'punch.', 'start': 18.964, 'end': 19.265, 'score': 0.888}]}, {'start': 20.445, 'end': 22.406, 'text': 'The box was thrown beside the parked truck.', 'words': [{'word': 'The', 'start': 20.445, 'end': 20.565, 'score': 0.89}, {'word': 'box', 'start': 20.605, 'end': 20.885, 'score': 0.956}, {'word': 'was', 'start': 20.926, 'end': 21.046, 'score': 0.907}, {'word': 'thrown', 'start': 21.106, 'end': 21.346, 'score': 0.621}, {'word': 'beside', 'start': 21.386, 'end': 21.706, 'score': 0.901}, {'word': 'the', 'start': 21.746, 'end': 21.806, 'score': 0.977}, {'word': 'parked', 'start': 21.866, 'end': 22.086, 'score': 0.65}, {'word': 'truck.', 'start': 22.126, 'end': 22.406, 'score': 0.859}]}, {'start': 23.767, 'end': 25.748, 'text': 'The hogs were fed chopped corn and garbage.', 'words': [{'word': 'The', 'start': 23.767, 'end': 23.867, 'score': 0.997}, {'word': 'hogs', 'start': 23.907, 'end': 24.147, 'score': 0.873}, {'word': 'were', 'start': 24.167, 'end': 24.287, 'score': 0.874}, {'word': 'fed', 'start': 24.347, 'end': 24.588, 'score': 0.763}, {'word': 'chopped', 'start': 24.628, 'end': 24.928, 'score': 0.671}, {'word': 'corn', 'start': 24.968, 'end': 25.208, 'score': 0.843}, {'word': 'and', 'start': 25.248, 'end': 25.328, 'score': 0.923}, {'word': 'garbage.', 'start': 25.348, 'end': 25.748, 'score': 0.902}]}, {'start': 27.129, 'end': 28.73, 'text': 'Four hours of study work faced us.', 'words': [{'word': 'Four', 'start': 27.129, 'end': 27.329, 'score': 0.819}, {'word': 'hours', 'start': 27.369, 'end': 27.629, 'score': 0.805}, {'word': 'of', 'start': 27.669, 'end': 27.709, 'score': 0.735}, {'word': 'study', 'start': 27.749, 'end': 28.01, 'score': 0.873}, {'word': 'work', 'start': 28.05, 'end': 28.25, 'score': 0.885}, {'word': 'faced', 'start': 28.29, 'end': 28.57, 'score': 0.97}, {'word': 'us.', 'start': 28.67, 'end': 28.73, 'score': 0.99}]}, {'start': 30.111, 'end': 32.092, 'text': ' A large size in stockings is hard to sell.', 'words': [{'word': 'A', 'start': 30.111, 'end': 30.171, 'score': 0.927}, {'word': 'large', 'start': 30.212, 'end': 30.454, 'score': 0.968}, {'word': 'size', 'start': 30.515, 'end': 30.758, 'score': 0.982}, {'word': 'in', 'start': 30.798, 'end': 30.879, 'score': 0.691}, {'word': 'stockings', 'start': 30.919, 'end': 31.344, 'score': 0.923}, {'word': 'is', 'start': 31.405, 'end': 31.486, 'score': 0.816}, {'word': 'hard', 'start': 31.526, 'end': 31.708, 'score': 0.834}, {'word': 'to', 'start': 31.748, 'end': 31.85, 'score': 0.938}, {'word': 'sell.', 'start': 31.89, 'end': 32.092, 'score': 0.954}]}] segment label ... intersection union 0 [ 00:00:00.486 --> 00:00:03.000] A ... -28.889031 31.605406 1 [ 00:00:04.266 --> 00:00:06.392] B ... -25.497156 27.825406 2 [ 00:00:07.776 --> 00:00:09.683] C ... -22.206531 24.315406 3 [ 00:00:10.847 --> 00:00:12.923] D ... -18.966531 21.244156 4 [ 00:00:14.205 --> 00:00:16.163] E ... -15.726531 17.886031 5 [ 00:00:17.294 --> 00:00:19.319] F ... -12.570906 14.797906 6 [ 00:00:20.399 --> 00:00:22.390] G ... -9.499656 11.692906 7 [ 00:00:23.723 --> 00:00:25.849] H ... -6.040281 8.368531 8 [ 00:00:27.064 --> 00:00:28.769] I ... -3.120906 5.027281 9 [ 00:00:30.017 --> 00:00:32.194] J ... 0.202000 2.176875 [10 rows x 7 columns] [{'start': 0.694, 'end': 2.995, 'text': ' Birch canoes slid on the smooth planks.', 'words': [{'word': 'Birch', 'start': 0.694, 'end': 1.034, 'score': 0.854, 'speaker': 'SPEAKER_00'}, {'word': 'canoes', 'start': 1.114, 'end': 1.555, 'score': 0.763, 'speaker': 'SPEAKER_00'}, {'word': 'slid', 'start': 1.595, 'end': 1.915, 'score': 0.881, 'speaker': 'SPEAKER_00'}, {'word': 'on', 'start': 2.015, 'end': 2.095, 'score': 0.909, 'speaker': 'SPEAKER_00'}, {'word': 'the', 'start': 2.115, 'end': 2.195, 'score': 0.789, 'speaker': 'SPEAKER_00'}, {'word': 'smooth', 'start': 2.255, 'end': 2.615, 'score': 0.828, 'speaker': 'SPEAKER_00'}, {'word': 'planks.', 'start': 2.695, 'end': 2.995, 'score': 0.861, 'speaker': 'SPEAKER_00'}], 'speaker': 'SPEAKER_00'}, {'start': 4.296, 'end': 6.357, 'text': 'Glued the sheet to the dark blue background.', 'words': [{'word': 'Glued', 'start': 4.296, 'end': 4.616, 'score': 0.474, 'speaker': 'SPEAKER_00'}, {'word': 'the', 'start': 4.676, 'end': 4.756, 'score': 0.968, 'speaker': 'SPEAKER_00'}, {'word': 'sheet', 'start': 4.796, 'end': 5.016, 'score': 0.933, 'speaker': 'SPEAKER_00'}, {'word': 'to', 'start': 5.056, 'end': 5.157, 'score': 0.776, 'speaker': 'SPEAKER_00'}, {'word': 'the', 'start': 5.177, 'end': 5.237, 'score': 0.952, 'speaker': 'SPEAKER_00'}, {'word': 'dark', 'start': 5.277, 'end': 5.517, 'score': 0.99, 'speaker': 'SPEAKER_00'}, {'word': 'blue', 'start': 5.577, 'end': 5.777, 'score': 0.844, 'speaker': 'SPEAKER_00'}, {'word': 'background.', 'start': 5.837, 'end': 6.357, 'score': 0.93, 'speaker': 'SPEAKER_00'}], 'speaker': 'SPEAKER_00'}, {'start': 7.838, 'end': 9.659, 'text': 'It is easy to tell the depth of a well.', 'words': [{'word': 'It', 'start': 7.838, 'end': 7.918, 'score': 0.932, 'speaker': 'SPEAKER_00'}, {'word': 'is', 'start': 7.978, 'end': 8.058, 'score': 0.724, 'speaker': 'SPEAKER_00'}, {'word': 'easy', 'start': 8.118, 'end': 8.318, 'score': 0.958, 'speaker': 'SPEAKER_00'}, {'word': 'to', 'start': 8.358, 'end': 8.438, 'score': 0.88, 'speaker': 'SPEAKER_00'}, {'word': 'tell', 'start': 8.498, 'end': 8.699, 'score': 0.712, 'speaker': 'SPEAKER_00'}, {'word': 'the', 'start': 8.739, 'end': 8.819, 'score': 0.828, 'speaker': 'SPEAKER_00'}, {'word': 'depth', 'start': 8.859, 'end': 9.119, 'score': 0.859, 'speaker': 'SPEAKER_00'}, {'word': 'of', 'start': 9.179, 'end': 9.279, 'score': 0.796, 'speaker': 'SPEAKER_00'}, {'word': 'a', 'start': 9.319, 'end': 9.339, 'score': 0.767, 'speaker': 'SPEAKER_00'}, {'word': 'well.', 'start': 9.399, 'end': 9.659, 'score': 0.933, 'speaker': 'SPEAKER_00'}], 'speaker': 'SPEAKER_00'}, {'start': 10.9, 'end': 12.841, 'text': 'These days a chicken leg is a rare dish.', 'words': [{'word': 'These', 'start': 10.9, 'end': 11.12, 'score': 0.856, 'speaker': 'SPEAKER_00'}, {'word': 'days', 'start': 11.16, 'end': 11.36, 'score': 0.87, 'speaker': 'SPEAKER_00'}, {'word': 'a', 'start': 11.4, 'end': 11.44, 'score': 0.515, 'speaker': 'SPEAKER_00'}, {'word': 'chicken', 'start': 11.48, 'end': 11.78, 'score': 0.932, 'speaker': 'SPEAKER_00'}, {'word': 'leg', 'start': 11.82, 'end': 12.0, 'score': 0.993, 'speaker': 'SPEAKER_00'}, {'word': 'is', 'start': 12.04, 'end': 12.121, 'score': 0.76, 'speaker': 'SPEAKER_00'}, {'word': 'a', 'start': 12.181, 'end': 12.221, 'score': 0.499, 'speaker': 'SPEAKER_00'}, {'word': 'rare', 'start': 12.281, 'end': 12.501, 'score': 0.776, 'speaker': 'SPEAKER_00'}, {'word': 'dish.', 'start': 12.581, 'end': 12.841, 'score': 0.878, 'speaker': 'SPEAKER_00'}], 'speaker': 'SPEAKER_00'}, {'start': 14.282, 'end': 16.123, 'text': 'Rice is often served in round bowls.', 'words': [{'word': 'Rice', 'start': 14.282, 'end': 14.522, 'score': 0.867, 'speaker': 'SPEAKER_00'}, {'word': 'is', 'start': 14.582, 'end': 14.662, 'score': 0.638, 'speaker': 'SPEAKER_00'}, {'word': 'often', 'start': 14.722, 'end': 15.022, 'score': 0.922, 'speaker': 'SPEAKER_00'}, {'word': 'served', 'start': 15.082, 'end': 15.362, 'score': 0.848, 'speaker': 'SPEAKER_00'}, {'word': 'in', 'start': 15.422, 'end': 15.502, 'score': 0.85, 'speaker': 'SPEAKER_00'}, {'word': 'round', 'start': 15.562, 'end': 15.783, 'score': 0.912, 'speaker': 'SPEAKER_00'}, {'word': 'bowls.', 'start': 15.823, 'end': 16.123, 'score': 0.647, 'speaker': 'SPEAKER_00'}], 'speaker': 'SPEAKER_00'}, {'start': 17.343, 'end': 19.265, 'text': 'The juice of lemons makes fine punch.', 'words': [{'word': 'The', 'start': 17.343, 'end': 17.464, 'score': 0.796, 'speaker': 'SPEAKER_00'}, {'word': 'juice', 'start': 17.504, 'end': 17.764, 'score': 0.976, 'speaker': 'SPEAKER_00'}, {'word': 'of', 'start': 17.804, 'end': 17.884, 'score': 0.83, 'speaker': 'SPEAKER_00'}, {'word': 'lemons', 'start': 17.944, 'end': 18.264, 'score': 0.914, 'speaker': 'SPEAKER_00'}, {'word': 'makes', 'start': 18.344, 'end': 18.564, 'score': 0.866, 'speaker': 'SPEAKER_00'}, {'word': 'fine', 'start': 18.644, 'end': 18.904, 'score': 0.914, 'speaker': 'SPEAKER_00'}, {'word': 'punch.', 'start': 18.964, 'end': 19.265, 'score': 0.888, 'speaker': 'SPEAKER_00'}], 'speaker': 'SPEAKER_00'}, {'start': 20.445, 'end': 22.406, 'text': 'The box was thrown beside the parked truck.', 'words': [{'word': 'The', 'start': 20.445, 'end': 20.565, 'score': 0.89, 'speaker': 'SPEAKER_00'}, {'word': 'box', 'start': 20.605, 'end': 20.885, 'score': 0.956, 'speaker': 'SPEAKER_00'}, {'word': 'was', 'start': 20.926, 'end': 21.046, 'score': 0.907, 'speaker': 'SPEAKER_00'}, {'word': 'thrown', 'start': 21.106, 'end': 21.346, 'score': 0.621, 'speaker': 'SPEAKER_00'}, {'word': 'beside', 'start': 21.386, 'end': 21.706, 'score': 0.901, 'speaker': 'SPEAKER_00'}, {'word': 'the', 'start': 21.746, 'end': 21.806, 'score': 0.977, 'speaker': 'SPEAKER_00'}, {'word': 'parked', 'start': 21.866, 'end': 22.086, 'score': 0.65, 'speaker': 'SPEAKER_00'}, {'word': 'truck.', 'start': 22.126, 'end': 22.406, 'score': 0.859, 'speaker': 'SPEAKER_00'}], 'speaker': 'SPEAKER_00'}, {'start': 23.767, 'end': 25.748, 'text': 'The hogs were fed chopped corn and garbage.', 'words': [{'word': 'The', 'start': 23.767, 'end': 23.867, 'score': 0.997, 'speaker': 'SPEAKER_00'}, {'word': 'hogs', 'start': 23.907, 'end': 24.147, 'score': 0.873, 'speaker': 'SPEAKER_00'}, {'word': 'were', 'start': 24.167, 'end': 24.287, 'score': 0.874, 'speaker': 'SPEAKER_00'}, {'word': 'fed', 'start': 24.347, 'end': 24.588, 'score': 0.763, 'speaker': 'SPEAKER_00'}, {'word': 'chopped', 'start': 24.628, 'end': 24.928, 'score': 0.671, 'speaker': 'SPEAKER_00'}, {'word': 'corn', 'start': 24.968, 'end': 25.208, 'score': 0.843, 'speaker': 'SPEAKER_00'}, {'word': 'and', 'start': 25.248, 'end': 25.328, 'score': 0.923, 'speaker': 'SPEAKER_00'}, {'word': 'garbage.', 'start': 25.348, 'end': 25.748, 'score': 0.902, 'speaker': 'SPEAKER_00'}], 'speaker': 'SPEAKER_00'}, {'start': 27.129, 'end': 28.73, 'text': 'Four hours of study work faced us.', 'words': [{'word': 'Four', 'start': 27.129, 'end': 27.329, 'score': 0.819, 'speaker': 'SPEAKER_00'}, {'word': 'hours', 'start': 27.369, 'end': 27.629, 'score': 0.805, 'speaker': 'SPEAKER_00'}, {'word': 'of', 'start': 27.669, 'end': 27.709, 'score': 0.735, 'speaker': 'SPEAKER_00'}, {'word': 'study', 'start': 27.749, 'end': 28.01, 'score': 0.873, 'speaker': 'SPEAKER_00'}, {'word': 'work', 'start': 28.05, 'end': 28.25, 'score': 0.885, 'speaker': 'SPEAKER_00'}, {'word': 'faced', 'start': 28.29, 'end': 28.57, 'score': 0.97, 'speaker': 'SPEAKER_00'}, {'word': 'us.', 'start': 28.67, 'end': 28.73, 'score': 0.99, 'speaker': 'SPEAKER_00'}], 'speaker': 'SPEAKER_00'}, {'start': 30.111, 'end': 32.092, 'text': ' A large size in stockings is hard to sell.', 'words': [{'word': 'A', 'start': 30.111, 'end': 30.171, 'score': 0.927, 'speaker': 'SPEAKER_00'}, {'word': 'large', 'start': 30.212, 'end': 30.454, 'score': 0.968, 'speaker': 'SPEAKER_00'}, {'word': 'size', 'start': 30.515, 'end': 30.758, 'score': 0.982, 'speaker': 'SPEAKER_00'}, {'word': 'in', 'start': 30.798, 'end': 30.879, 'score': 0.691, 'speaker': 'SPEAKER_00'}, {'word': 'stockings', 'start': 30.919, 'end': 31.344, 'score': 0.923, 'speaker': 'SPEAKER_00'}, {'word': 'is', 'start': 31.405, 'end': 31.486, 'score': 0.816, 'speaker': 'SPEAKER_00'}, {'word': 'hard', 'start': 31.526, 'end': 31.708, 'score': 0.834, 'speaker': 'SPEAKER_00'}, {'word': 'to', 'start': 31.748, 'end': 31.85, 'score': 0.938, 'speaker': 'SPEAKER_00'}, {'word': 'sell.', 'start': 31.89, 'end': 32.092, 'score': 0.954, 'speaker': 'SPEAKER_00'}], 'speaker': 'SPEAKER_00'}] Process finished with exit code 0 ```

Future work

sulutian commented 2 months ago

How do I use Silero VAD with WhisperX!!

3manifold commented 2 months ago

How do I use Silero VAD with WhisperX!!

From the pull request description:

Example command line (applies also for --vad_method pyannote):

  • GPU: python3 -m whisperx.transcribe audio.wav --language en --device cuda --diarize --hf_token xxx --vad_method silero
  • CPU: python3 -m whisperx.transcribe audio.wav --language en --device cpu --diarize --hf_token xxx --compute_type int8 --vad_method silero
sulutian commented 1 month ago

如何将 Silero VAD 与 WhisperX 一起使用!

来自请求的描述:

窗口命令行(也适用于--vad_method pyannote):

  • 图形处理器:python3 -m whisperx.transcribe audio.wav --language en --device cuda --diarize --hf_token xxx --vad_method silero
  • 中央处理器:python3 -m whisperx.transcribe audio.wav --language en --device cpu --diarize --hf_token xxx --compute_type int8 --vad_method silero

An error occurred whisperx: error: unrecognized arguments: --vad_method silero

3manifold commented 1 month ago

如何将 Silero VAD 与 WhisperX 一起使用!

来自请求的描述:

窗口命令行(也适用于--vad_method pyannote):

  • 图形处理器:python3 -m whisperx.transcribe audio.wav --language en --device cuda --diarize --hf_token xxx --vad_method silero
  • 中央处理器:python3 -m whisperx.transcribe audio.wav --language en --device cpu --diarize --hf_token xxx --compute_type int8 --vad_method silero

An error occurred whisperx: error: unrecognized arguments: --vad_method silero

You have to checkout silero-vad branch

sulutian commented 1 month ago

如何将 Silero VAD 与 WhisperX 一起使用!

来自请求的描述:

窗口命令行(也适用于--vad_method pyannote):

  • 图形处理器:python3 -m whisperx.transcribe audio.wav --language en --device cuda --diarize --hf_token xxx --vad_method silero
  • 中央处理器:python3 -m whisperx.transcribe audio.wav --language en --device cpu --diarize --hf_token xxx --compute_type int8 --vad_method silero

发生错误 whisperx:错误:无法识别的参数:--vad_method silero

您必须结帐silero-vad分行

I have * main remotes/origin/HEAD -> origin/main remotes/origin/main remotes/origin/silero-vad

3manifold commented 1 month ago

如何将 Silero VAD 与 WhisperX 一起使用!

来自请求的描述:

窗口命令行(也适用于--vad_method pyannote):

  • 图形处理器:python3 -m whisperx.transcribe audio.wav --language en --device cuda --diarize --hf_token xxx --vad_method silero
  • 中央处理器:python3 -m whisperx.transcribe audio.wav --language en --device cpu --diarize --hf_token xxx --compute_type int8 --vad_method silero

发生错误 whisperx:错误:无法识别的参数:--vad_method silero

您必须结帐silero-vad分行

I have * main remotes/origin/HEAD -> origin/main remotes/origin/main remotes/origin/silero-vad

You can run git checkout -t origin/silero-vad to checkout the remote branch.

sulutian commented 1 month ago

如何将 Silero VAD 与 WhisperX 一起使用!

来自请求的描述:

窗口命令行(也适用于--vad_method pyannote):

  • 图形处理器:python3 -m whisperx.transcribe audio.wav --language en --device cuda --diarize --hf_token xxx --vad_method silero
  • 中央处理器:python3 -m whisperx.transcribe audio.wav --language en --device cpu --diarize --hf_token xxx --compute_type int8 --vad_method silero

发生错误whisperx:错误:无法识别的参数:--vad_method silero

男人结帐silero-vad分行

我有 * 主遥控器/原点/HEAD -> 原点/主遥控器/原点/主遥控器/原点/silero-vad

您可以运行git checkout -t origin/silero-vad来检出远程分支。

i showed up!! whisperX-silero-vad>git checkout -t origin/silero-vad fatal: a branch named 'silero-vad' already exists

sulutian commented 1 month ago

When will a parameter for threshold adjustment be added?