jianfch / stable-ts

Transcription, forced alignment, and audio indexing with OpenAI's Whisper
MIT License
1.6k stars 177 forks source link

Expected SRT output? #161

Closed FromCollin closed 1 year ago

FromCollin commented 1 year ago

I loaded this up in python and got it to output but it's adding extra things to my SRT output rather than individual words. The expected behavior is individual words per timecode or groups of words and the output is words that have their color changed. Is there a setting to not get this output and instead just groups of words or individual words? Pretty cool project regardless!

0 00:00:00,720 --> 00:00:00,940 Because that is very in character for my character.

1 00:00:00,940 --> 00:00:01,400 Because that is very in character for my character.

2 00:00:01,400 --> 00:00:01,600 Because that is very in character for my character.

ryanheise commented 1 year ago

Not sure if it's stuck in a loop or whether it's using the karaoke option but the highlight tags got deleted in the copy and paste. What options did you use?

FromCollin commented 1 year ago

Hello!

This is my entire code: import stable_whisper model = stable_whisper.load_model('base') result0 = model.transcribe('audio/Ah_yes.mp3')

result0.to_srt_vtt('audio_new.srt')

jianfch commented 1 year ago

Is there a setting to not get this output and instead just groups of words or individual words?

groups of words: result0.to_srt_vtt('audio_new.srt', word_level=False) individual words: result0.to_srt_vtt('audio_new.srt', segment_level=False) See examples.

FromCollin commented 1 year ago

Okay, yes. This is my fault for not reading the docs and examples closer. Thanks for helping out. It's working as expected now.