jianfch / stable-ts

Transcription, forced alignment, and audio indexing with OpenAI's Whisper
MIT License
1.59k stars 177 forks source link

Support for newest Faster Whisper #387

Closed wzqww23 closed 2 months ago

wzqww23 commented 3 months ago

Hi, would it possible to support Faster Whisper 1.0.3? They added a detect language function that I would like to use. I have tried to use Faster Whisper 1.0.3 but it would raise sometimes raise errors.

jianfch commented 3 months ago

Stable-ts does not interfere with the Faster Whisper functions. What lines did you run and what were the errors you got?

wzqww23 commented 3 months ago

It appears that when using the latest commits of Faster Whisper (>= version 1.0.3), stable-ts would sometimes throw errors when the model outputs undesirable transcriptions, perhaps due to missing punctuations?

Detected Language: english
Transcribe:  38%|█████████████████████████████████████████████████████████▍                                                                                             | 76.84/201.97 [00:04<00:07, 15.83sec/s]
Traceback (most recent call last):
  File "/home/voila/code/runpod/stable-ts/src/temp.py", line 5, in <module>
    result = model.transcribe_stable("/home/voila/code/runpod/stable-ts/enml.mp3")
  File "/home/voila/code/runpod/stable-ts/src/stable_whisper/whisper_word_level/faster_whisper.py", line 150, in faster_transcribe
    return transcribe_any(
  File "/home/voila/code/runpod/stable-ts/src/stable_whisper/non_whisper.py", line 343, in transcribe_any
    result = inference_func(**inference_kwargs)
  File "/home/voila/code/runpod/stable-ts/src/stable_whisper/whisper_word_level/faster_whisper.py", line 199, in _inner_transcribe
    for segment in segments:
  File "/home/voila/code/runpod/stable-ts/venv/lib/python3.10/site-packages/faster_whisper/transcribe.py", line 1309, in generate_segments
    self.add_word_timestamps(
  File "/home/voila/code/runpod/stable-ts/venv/lib/python3.10/site-packages/faster_whisper/transcribe.py", line 1648, in add_word_timestamps
    median_duration, max_duration = median_max_durations[segment_idx]
IndexError: list index out of range

It does not consistently return this error, even when transcribing the same audio. When using disil-large-v2 model, or when condition_on_previous = true, it appears this error is triggered more often.

Many thanks!

jianfch commented 3 months ago

This appears to be an issue with Faster-Whisper at this line. I'd suggest submitting the issue on Faster-Whisper's repo.