Closed flesnuk closed 1 year ago
Setting segment_level=True and word_level=False instead of word_timestamps=False in the to_srt_vtt arguments allowed SRT generation at the segment level only
result.to_srt_vtt(output_path, segment_level=True, word_level=False)
yes, but that makes the script use the timing.py from whisper for word timing processing, which weirdly makes a increase in VRAM usage when using a finetuned model (exceeding 8gb vram for medium finetuned model, but works with standard medium, it's weird). The OOM error is caused here: https://github.com/openai/whisper/blob/main/whisper/timing.py#L49
That's why I use the word_timestamps=False option for now.
A temporary quick fix is use regroup=False
because there appears to be bug in the regrouping logic.
result = model.transcribe('audio.mp3', regroup=False)
Should be fixed in the latest commit.
Thanks, with the last commit it works
I don't want the word timing functionality. So I use word_timestamps=False parameter in transcribe function.
When trying to save the results using
result.to_srt_vtt
I get an empty file. So I have to resort using the Whisper writer like so: