lowerquality / gentle

gentle forced aligner
https://lowerquality.com/gentle/
MIT License
1.45k stars 295 forks source link

Using gentle's output to get starting and ending times of sentences rather than words #209

Closed saurabhvyas closed 5 years ago

saurabhvyas commented 6 years ago

Thanks for creating gentle, in the output json, I can see starting and ending times of words, and phonemes, however, I want to ask, if it's possible, to get starting and ending times of larger groups of text, like phrases or sentences, something, which is very useful for creating dataset for ASR system, from youtube subtitles and audio.

chuckcho commented 5 years ago

I've done something to get sentence-level timestamps. It's quite straight-forward:

  1. Get sentence segmentation from the text (NLP)
  2. Run gentle and get all word-level timestamps
  3. Scan each word from gentle output and match the first and last word in a sentence

One heurisic is to match only the first and last word, and disregard everything in between.

saurabhvyas commented 5 years ago

Thanks for your comment, I guess I''l close this now, because I am no longer using this, maybe its helpful to others.