Issue with word-level timestamp with WhisperX V2

m-bain / whisperX

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

BSD 2-Clause "Simplified" License

11.3k stars 1.18k forks source link

Issue with word-level timestamp with WhisperX V2 #163

Open shruru opened 1 year ago

shruru commented 1 year ago

hi there, somehow I couldn't export srt file with word_timestamp enabled. there is not word-level srt file generated. e.g.

whisperx --verbose True --model large-v2 --language en --hf_token XXXXX --word_timestamps True --output_format srt --output_dir . --vad_filter False Video.mp4

Please advise Thank you~

m-bain commented 1 year ago

hey, so you dont need --word_timestamps True this is for the original whisper model. You could play with these to see how it compares, but currently the code doesn't parse the original whisper timestamps.

And to get word-level you need to switch to either --output_format word.srt or --output_format ass (the latter is with highlighting)

m-bain commented 1 year ago

todo (for self): documentation to explain --word_timestamps for OG whisper parse OG whisper word_timestamps for outputting to ~~word.srt~~ "srt-word" for comparison

shruru commented 1 year ago

it works . thanks! but it should be "srt-word" :P

Saccarab commented 1 year ago

so are OG whisper word_confidence values just buried in whisperX result somewhere or is the parameter not enabled on whisperX requests in the first place?

pdahiya commented 1 year ago

in latest version - the .ass format has been removed. How to get word level timestamped transcript now? The VTT or srt files doesn't have word level timestamps. @m-bain can you please help?

m-bain commented 1 year ago

@pdahiya --highlight_words True

outputs srt with highlighted words

m-bain commented 1 year ago

You can also use the json to generate your own, I will add .ass back at some point

sanxfxteam commented 11 months ago

I get this error when trying to output srt-word, what's the way to do it now? whisperx: error: argument --output_format/-f: invalid choice: 'srt-word' (choose from 'all', 'srt', 'vtt', 'txt', 'tsv', 'json', 'aud')