m-bain / whisperX

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
BSD 2-Clause "Simplified" License
10.32k stars 1.09k forks source link

Sentence based timestamps #142

Open pkompiel opened 1 year ago

pkompiel commented 1 year ago

Is there a way to group the words in a way, so that I can get sentence based segments? By default, some of the words at the end of a long sentence get transferred to the next timestamps (in the SRT and VTT files). I'm trying to get the whole sentence (even if it is long) grouped under one timestamp. I guess the idea would be to put a new timestamp every time there is a period in the text

arnavmehta7 commented 1 year ago

This can be solved by plain algorithms. Logic:

  1. Do the transcription segments
  2. Do the alignment aligned_segments
  3. initialize custom_segs = []
  4. Loop over all the aligned_segments words and see if the word ends with a fullstop, question mark, exclamation (use some nltk function). While the word is not ending with above stuff, add the words into a string. When the word ends, then append the string to custom_segs, and continue the process.
wake704 commented 1 year ago

I've been wanting this for a long time. I still haven't found a way to do it. This is a must-have feature for anyone who needs to translate.

cnbeining commented 1 year ago

This is pretty much what https://github.com/cnbeining/Whisper_Notebook is built for - read https://colab.research.google.com/github/cnbeining/Whisper_Notebook/blob/master/WhisperX.ipynb for reference.

orkhanruth commented 1 year ago

https://colab.research.google.com/github/cnbeining/Whisper_Notebook/blob/master/WhisperX.ipynb for reference.

!wget -q https://chineseaci.com/tools/megadl ./megadl
!chmod +x ./megadl

Can I ask what this file is and what it does?

cnbeining commented 1 year ago

That’s a tool to download files from mega.io. You can remove this part if you won’t need that functionality.

Cheers, On Jul 26, 2023 at 10:28 PM -0400, Orkhan Rutherford @.***>, wrote:

https://colab.research.google.com/github/cnbeining/Whisper_Notebook/blob/master/WhisperX.ipynb for reference. !wget -q https://chineseaci.com/tools/megadl ./megadl !chmod +x ./megadl Can I ask what this file is and what it does? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

APISeeker commented 5 months ago

Hello thanks, has anyone of you made the tool himsmelf and is willing to share it? I found the output to be too much to read, I want it to be similar to an srt etc ? Also @arnavmehta7 what is "nltk" ?