m-bain / whisperX

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

BSD 2-Clause "Simplified" License

11.26k stars 1.18k forks source link

Open RichardQin1 opened 1 month ago

RichardQin1 commented 1 month ago

It is known that the text is a segment of the audio

eg:

特朗普右耳纏紗布現身
並將在大會上發表全國講話
特朗普表示槍擊事件之後

test.mp3 input(text,test.mp3) output:

特朗普右耳纏紗布現身    start_time:10000 end_time:12000
並將在大會上發表全國講話    start_time:12000 end_time:15000
特朗普表示槍擊事件之後    start_time:15000 end_time:18000

How to obtain the start and end timestamps of each sentence

RichardQin1 commented 1 month ago

plese help!!! thanks

lucashuguet commented 1 month ago

36