jianfch / stable-ts

Transcription, forced alignment, and audio indexing with OpenAI's Whisper
MIT License
1.61k stars 178 forks source link

No word timestamps but accuracy like with them? #131

Closed akmere closed 1 year ago

akmere commented 1 year ago

Hi. Is there a quick way to keep file clean without word timestamps but at the same time for times to be accurate?

Example without word timestamps

00:00:09,000 --> 00:00:10,000 Narobiłeś w portki ze strachu?

When I do the same with word timestamps, accuracy is much better

00:00:09,260 --> 00:00:09,720 <\font color="#00ff00">Narobiłeś<\/font> w portki ze strachu? (...)

Sometimes it also works well without word timestamps, but not in this particular example. I do not need word timestamps in my output file, is there a simple way to achieve this with current capabilities of stable-ts? Thanks!

Yunesss commented 1 year ago

this was answered here #111 https://github.com/jianfch/stable-ts/issues/111#issuecomment-1480595162

keep word_timestamps=True but add result.to_srt_vtt('subs.srt', word_level=False)

QuesoDad commented 1 year ago

@echo off for /r %%X in (.mp4 .mp3 .wav .aif .aiff .avi .mpg .mkv *.mov) do ( if exist %%X.srt ( echo "%%X already transcribed") else (echo "%%X.srt file doesn't exist. Creating Now" stable-ts "%%X" --model base.en --task transcribe --language English --threads 48 --output_dir "%%~pX." -o "%%X.json" stable-ts "%%X" --word_level False -o "%%X.srt" ) )

From a Dos Batch file, I wrote.

So in this case, the first pass makes the Json and the second pass makes a SRT without the font interruptions. The benefit of this is that you can always go back to the json for another export with changed settings. The SRT in the second pass still benefits from the word level timestamps and regrouping of the words into sentences. If you go straight to SRT with timestamps turned off it'll give you the same basic output as the original Whisper with not breaks or sentence structure.