feat: support force alignment (feed text + audio and generate timestamp)

linto-ai / whisper-timestamped

Multilingual Automatic Speech Recognition with word-level timestamps and confidence

GNU Affero General Public License v3.0

1.87k stars 149 forks source link

feat: support force alignment (feed text + audio and generate timestamp) #204

Open carafelix opened 1 month ago

carafelix commented 1 month ago

Let say I have a text and a 100% correct transcription (because I literally read the text and record it). I want to generate timestamp's for that given text. It is possible to feed the text to this model and produce a timestamped output of that text?

Huanshere commented 1 week ago

I guess this is what you're looking for: https://github.com/EtienneAb3d/SRT-Sync


Input 1: SRT with good timestamps and bad-quality text
Input 2: good text-only, or SRT with good text and bad timestamps
Output: SRT with good text and good timestamps