jianfch / stable-ts

Transcription, forced alignment, and audio indexing with OpenAI's Whisper
MIT License
1.61k stars 178 forks source link

Convert from word-by-word subtitle to karaoke style .ass sub #147

Closed moisestohias closed 1 year ago

moisestohias commented 1 year ago

Can you provide a standalone script to convert a word-by-word .srt or .ass subtitle to karaoke style .ass subtitle. It would be nice if we have a single script that takes the word-by-word subtitle, and generate the .ass karaoke style sub. I've tried to implement this, I've faced my issues, since you have implemented most of the tools, I think this won't take a lot of time.

jianfch commented 1 year ago

This is supported in the latest version with --karaoke true for CLI or results.to_ass('output.ass', karaoke=True).

moisestohias commented 1 year ago

My request is to take an existing word-by-word transcript and convert to karaoke style, not to generate one from scratch, since I already have the word-by-word transcripts. Secondly I am using whisper.cpp the reason why don't have a powerful graphic card on my laptop, so I don't want to isntall pytorch ... Currently, in order to generate Karaoke subtitles in .ass format, I have to perform two separate passes. The first pass is used to obtain word-by-word time stamps, while the second pass is used to identify sentence boundaries. After obtaining sentence boundaries, I then use my custom script to insert time stamps (\k tags) into the subtitle file. For whatever reason the timing is off.

jianfch commented 1 year ago

Hi @moisestohias. You might want to take a look at https://github.com/jianfch/stable-ts/blob/main/examples/non-whisper.ipynb. Once you manage to wrap the data into WhisperResult use can just use result.to_ass('output.ass', karaoke=True) but it will require pytorch to be installed because of the import statements. However it does not use pytorch for converting the data into an output format, so you can just remove all lines with pytorch dependency (i.e. remove imports statements starting with from .stabilization) from result.py and text_output.py.

moisestohias commented 1 year ago

Hi @jianfch thanks for the prompt reply, I will try it.