Convert srt with --highlight_word format to normal srt (or vtt)

m-bain / whisperX

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

BSD 4-Clause "Original" or "Old" License

9.98k stars 1.04k forks source link

Hi,

I have SRT using --highlight_word output like so:

https://github.com/m-bain/whisperX/issues/539

I would like to convert files in this format using the CLI to files in a format where the individual words are not highlighted, and one gets a normal (but speaker diarized, and properly timestamped) output, which might look like:

3
00:00:04,578 --> 00:00:08,040
[SPEAKER_00]: So, first, is there anything you want to know about me first?

...

First, I was wondering if such a script already existed, or whether I should write one.

Second, does the individually word-tagged diarized .srt format have an actual name?

Third, I'd like to be able to go in and identify the speakers after the fact,a and put their names in in place of [SPEAKER_00]. Is there a tool for that, or should I also write one.

Thanks,

Andrew

def convert_time_format(start_time, end_time): start_seconds = int(start_time) start_minutes = start_seconds // 60 start_seconds %= 60 start_milliseconds = int((start_time - int(start_time)) * 1000) end_seconds = int(end_time) end_minutes = end_seconds // 60 end_seconds %= 60 end_milliseconds = int((end_time - int(end_time)) * 1000) return f"{start_minutes:02d}:{start_seconds:02d}.{start_milliseconds:03d} --> {end_minutes:02d}:{end_seconds:02d}.{end_milliseconds:03d}" def write_to_result(result, file_path): mode = "w" with open(file_path, mode, encoding="utf-8") as f: f.write("WEBVTT") f.write("\n\n") for segment in result["segments"]: f.write(f'{convert_time_format(segment["start"],segment["end"])}') f.write("\n") f.write( (("[[" + segment["speaker"] + "]]") if "speaker" in segment else "") + " " + segment["text"].strip().replace("\t", " ") ) f.write("\n\n")

m-bain / whisperX

Convert srt with --highlight_word format to normal srt (or vtt) #701