m-bain / whisperX

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
BSD 2-Clause "Simplified" License
11.26k stars 1.18k forks source link

How to achieve known text content and obtain the timestamp of the text corresponding to the audio #839

Open RichardQin1 opened 1 month ago

RichardQin1 commented 1 month ago

It is known that the text is a segment of the audio

eg:

特朗普右耳纏紗布現身
並將在大會上發表全國講話
特朗普表示槍擊事件之後

test.mp3 input(text,test.mp3) output:

特朗普右耳纏紗布現身    start_time:10000 end_time:12000
並將在大會上發表全國講話    start_time:12000 end_time:15000
特朗普表示槍擊事件之後    start_time:15000 end_time:18000

How to obtain the start and end timestamps of each sentence

RichardQin1 commented 1 month ago

plese help!!! thanks

lucashuguet commented 1 month ago

36