In a recent hunt for more ASR providers who offer per-word timecodes I found some that I already knew of and a few I hadn't heard of before. Among all providers is WhisperX.
We are all familiar with OpenAI's Whisper technology, however those default models only produce timecodes for long phrases of words and the timecodes are not very accurate.
WhisperX is a fork of Whisper that provides timecodes of greater accuracy with beginning and end timecodes for every word in the transcript.
In a recent hunt for more ASR providers who offer per-word timecodes I found some that I already knew of and a few I hadn't heard of before. Among all providers is WhisperX.
We are all familiar with OpenAI's Whisper technology, however those default models only produce timecodes for long phrases of words and the timecodes are not very accurate. WhisperX is a fork of Whisper that provides timecodes of greater accuracy with beginning and end timecodes for every word in the transcript.
You can generate test data with a free demo of WhisperX.
Or, if you would like test data that is already generated: ASR Timed Text Format Test 2 [WhisperX].json The corresponding audio file can be obtained here.
Being able to import WhisperX's format would allow WhisperX users to bring their transcripts and edit them in HyperAudio Lite Editor, if desired.