Closed moisestohias closed 1 year ago
This is supported in the latest version with --karaoke true
for CLI or results.to_ass('output.ass', karaoke=True)
.
My request is to take an existing word-by-word transcript and convert to karaoke style, not to generate one from scratch, since I already have the word-by-word transcripts. Secondly I am using whisper.cpp the reason why don't have a powerful graphic card on my laptop, so I don't want to isntall pytorch ... Currently, in order to generate Karaoke subtitles in .ass format, I have to perform two separate passes. The first pass is used to obtain word-by-word time stamps, while the second pass is used to identify sentence boundaries. After obtaining sentence boundaries, I then use my custom script to insert time stamps (\k tags) into the subtitle file. For whatever reason the timing is off.
Hi @moisestohias. You might want to take a look at https://github.com/jianfch/stable-ts/blob/main/examples/non-whisper.ipynb.
Once you manage to wrap the data into WhisperResult
use can just use result.to_ass('output.ass', karaoke=True)
but it will require pytorch to be installed because of the import statements. However it does not use pytorch for converting the data into an output format, so you can just remove all lines with pytorch dependency (i.e. remove imports statements starting with from .stabilization
) from result.py and text_output.py.
Hi @jianfch thanks for the prompt reply, I will try it.
Can you provide a standalone script to convert a word-by-word .srt or .ass subtitle to karaoke style .ass subtitle. It would be nice if we have a single script that takes the word-by-word subtitle, and generate the .ass karaoke style sub. I've tried to implement this, I've faced my issues, since you have implemented most of the tools, I think this won't take a lot of time.