Open neuromaancer opened 3 years ago
You could use another library like PyDub. PyDub has a function called detect silence
or something similar. Once you have determined what timestamps of the audio are silent, you can use some Python-coder-magic to fix your alignments ;)
Reference: https://stackoverflow.com/questions/45526996/split-audio-files-using-silence-detection/46001755
Hi,
I am trying to the forced alignment between a transcript and a video, however, among the utterances, there are lots of silences. How do I figure the tool to detect those silence and not get a consecutive result. In an example:
I want from:
That thereby beauty's rose might never die, => [00:00:05.880, 00:00:09.240] But as the riper should by time decease, => [00:00:09.240, 00:00:11.920] His tender heir might bear his memory: => [00:00:11.920, 00:00:15.280] But thou contracted to thine own bright eyes, => [00:00:15.280, 00:00:18.800] Feed'st thy light's flame with self-substantial fuel, => [00:00:18.800, 00:00:22.760] Making a famine where abundance lies, => [00:00:22.760, 00:00:25.680]
to
That thereby beauty's rose might never die, => [00:00:05.880, 00:00:09.240] But as the riper should by time decease, => [00:00:10.240, 00:00:11.920] His tender heir might bear his memory: => [00:00:11.920, 00:00:15.280] But thou contracted to thine own bright eyes, => [00:00:16.280, 00:00:18.800] Feed'st thy light's flame with self-substantial fuel, => [00:00:19.800, 00:00:22.760] Making a famine where abundance lies, => [00:00:22.760, 00:00:25.680]
Thank you in advance.