linto-ai / whisper-timestamped

Multilingual Automatic Speech Recognition with word-level timestamps and confidence
GNU Affero General Public License v3.0
2.01k stars 156 forks source link

How to write SRT file? Are models the same as whisper? #42

Closed Adsc58 closed 1 year ago

Adsc58 commented 1 year ago

Thank you for the code. I want to read the word-level timestamp results as an SRT format. How can I do that? Are the model files(tiny,large-v2, large etc.) you are using the same as the model files in this original code "https://github.com/openai/whisper"

import whisper_timestamped as whisper

audio = whisper.load_audio("audio.mp3")

model = whisper.load_model("tiny", device="cuda")

result = whisper.transcribe_timestamped(model, audio, language="en")
Jeronymous commented 1 year ago

The models are exactly the same yes. (whisper_timestamped is just doing an import of load_audio and load_model functions from whisper, so they do exactly the same).

To write an SRT file, you can do (if you are using the last version of whisper_timestamped):

from whisper_timestamped.make_subtitles import write_srt

result = whisper.transcribe_timestamped(...)

write_srt(result["segments"], open("file.srt", "w", encoding = "utf8"))
Jeronymous commented 1 year ago

Well, my previous answer concerning SRT was to write an SRT file for segments. If you want to do it on words (not segments) you can do:

def flatten(list_of_lists, key = None):
    for sublist in list_of_lists:
        for item in sublist.get(key, []) if key else sublist:
            yield item

write_srt(flatten(result["segments"], "words"), open("file.words.srt", "w", encoding = "utf8"))
Jeronymous commented 1 year ago

And note that you can write both SRT files using the CLI:

python whisper_timestamped/transcribe.py file.wav --model tiny --output_dir . --output_format srt

(the files will be named ./file.wav.srt and ./file.wav.words.srt)

Adsc58 commented 1 year ago

thank you for the detailed information.