readbeyond / aeneas

aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment)
http://www.readbeyond.it/aeneas/
GNU Affero General Public License v3.0
2.44k stars 218 forks source link

How to detect the silence and get separative line like below. #278

Open neuromaancer opened 2 years ago

neuromaancer commented 2 years ago

Hi,

I am trying to the forced alignment between a transcript and a video, however, among the utterances, there are lots of silences. How do I figure the tool to detect those silence and not get a consecutive result. In an example:

I want from:

That thereby beauty's rose might never die, => [00:00:05.880, 00:00:09.240] But as the riper should by time decease, => [00:00:09.240, 00:00:11.920] His tender heir might bear his memory: => [00:00:11.920, 00:00:15.280] But thou contracted to thine own bright eyes, => [00:00:15.280, 00:00:18.800] Feed'st thy light's flame with self-substantial fuel, => [00:00:18.800, 00:00:22.760] Making a famine where abundance lies, => [00:00:22.760, 00:00:25.680]

to

That thereby beauty's rose might never die, => [00:00:05.880, 00:00:09.240] But as the riper should by time decease, => [00:00:10.240, 00:00:11.920] His tender heir might bear his memory: => [00:00:11.920, 00:00:15.280] But thou contracted to thine own bright eyes, => [00:00:16.280, 00:00:18.800] Feed'st thy light's flame with self-substantial fuel, => [00:00:19.800, 00:00:22.760] Making a famine where abundance lies, => [00:00:22.760, 00:00:25.680]

Thank you in advance.

SaadBazaz commented 2 years ago

You could use another library like PyDub. PyDub has a function called detect silence or something similar. Once you have determined what timestamps of the audio are silent, you can use some Python-coder-magic to fix your alignments ;)

Reference: https://stackoverflow.com/questions/45526996/split-audio-files-using-silence-detection/46001755