readbeyond / aeneas

aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment)
http://www.readbeyond.it/aeneas/
GNU Affero General Public License v3.0
2.49k stars 228 forks source link

VAD options. Can I change them? #259

Closed ApayRus closed 3 years ago

ApayRus commented 4 years ago

Hi! Thank you for your greatest library. I so love it and use it last weeks! :) I runned vad and got file with speach, nonespeach intervals. Then I visualized it in Audacity by "import --> labels". (track 1)

Also I runned Audacity analyze --> sound finder with different params and got track 2 and 3.

vad-lost

How we can see, Aeneas/vad eats parts of speech, but Audacity/sound finder don't , and works properly. Is there a way to change Aeneas/vad parameters, how we can change them in Audacity/sound finder?

Screenshot from 2020-07-19 04-45-20

If you ask me, why I need this... I want to add gaps between phrases. Aeneas marks phrases in that way (without gaps):

Screenshot from 2020-07-19 05-05-51

I need phrases in that way (with gaps):

Screenshot from 2020-07-19 05-05-00

Also you can take a look at my site where I implement all this things frazy.me

readbeyond commented 3 years ago

@Aparus you can tweak the VAD included in aeneas:

and also the way the boundaries are set:

but the raw truth is that the VAD included in aeneas is very rough (just compares the spectral energy). The VAD included in Audacity probably works better because it implements a better algorithm.

Currently there is no way of hooking in a different VAD implementation, you would need to run aeneas from source (e.g., from an editable installation) and change vad.py yourself.

In the past, I tried the WebRTC via https://github.com/wiseman/py-webrtcvad but it has some limitations/problems, so I did not integrate it in aeneas "open source".

I might consider supporting a better VAD or even allowing users to hook-in custom VADs in aeneas 2.0.0, but that will not happen any time soon.