Closed wzqww23 closed 2 months ago
Stable-ts does not interfere with the Faster Whisper functions. What lines did you run and what were the errors you got?
It appears that when using the latest commits of Faster Whisper (>= version 1.0.3), stable-ts would sometimes throw errors when the model outputs undesirable transcriptions, perhaps due to missing punctuations?
Detected Language: english
Transcribe: 38%|█████████████████████████████████████████████████████████▍ | 76.84/201.97 [00:04<00:07, 15.83sec/s]
Traceback (most recent call last):
File "/home/voila/code/runpod/stable-ts/src/temp.py", line 5, in <module>
result = model.transcribe_stable("/home/voila/code/runpod/stable-ts/enml.mp3")
File "/home/voila/code/runpod/stable-ts/src/stable_whisper/whisper_word_level/faster_whisper.py", line 150, in faster_transcribe
return transcribe_any(
File "/home/voila/code/runpod/stable-ts/src/stable_whisper/non_whisper.py", line 343, in transcribe_any
result = inference_func(**inference_kwargs)
File "/home/voila/code/runpod/stable-ts/src/stable_whisper/whisper_word_level/faster_whisper.py", line 199, in _inner_transcribe
for segment in segments:
File "/home/voila/code/runpod/stable-ts/venv/lib/python3.10/site-packages/faster_whisper/transcribe.py", line 1309, in generate_segments
self.add_word_timestamps(
File "/home/voila/code/runpod/stable-ts/venv/lib/python3.10/site-packages/faster_whisper/transcribe.py", line 1648, in add_word_timestamps
median_duration, max_duration = median_max_durations[segment_idx]
IndexError: list index out of range
It does not consistently return this error, even when transcribing the same audio. When using disil-large-v2
model, or when condition_on_previous = true
, it appears this error is triggered more often.
Many thanks!
This appears to be an issue with Faster-Whisper at this line. I'd suggest submitting the issue on Faster-Whisper's repo.
Hi, would it possible to support Faster Whisper 1.0.3? They added a detect language function that I would like to use. I have tried to use Faster Whisper 1.0.3 but it would raise sometimes raise errors.