VAD options. Can I change them?

readbeyond / aeneas

aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment)

GNU Affero General Public License v3.0

2.49k stars 228 forks source link

Hi! Thank you for your greatest library. I so love it and use it last weeks! :) I runned vad and got file with speach, nonespeach intervals. Then I visualized it in Audacity by "import --> labels". (track 1)

Also I runned Audacity analyze --> sound finder with different params and got track 2 and 3.

vad-lost

How we can see, Aeneas/vad eats parts of speech, but Audacity/sound finder don't , and works properly. Is there a way to change Aeneas/vad parameters, how we can change them in Audacity/sound finder?

Screenshot from 2020-07-19 04-45-20

If you ask me, why I need this... I want to add gaps between phrases. Aeneas marks phrases in that way (without gaps):

Screenshot from 2020-07-19 05-05-51

I need phrases in that way (with gaps):

Screenshot from 2020-07-19 05-05-00

Also you can take a look at my site where I implement all this things frazy.me

@Aparus you can tweak the VAD included in aeneas:

and also the way the boundaries are set:

https://www.readbeyond.it/aeneas/docs/adjustboundaryalgorithm.html

but the raw truth is that the VAD included in aeneas is very rough (just compares the spectral energy). The VAD included in Audacity probably works better because it implements a better algorithm.

Currently there is no way of hooking in a different VAD implementation, you would need to run aeneas from source (e.g., from an editable installation) and change vad.py yourself.

In the past, I tried the WebRTC via https://github.com/wiseman/py-webrtcvad but it has some limitations/problems, so I did not integrate it in aeneas "open source".

I might consider supporting a better VAD or even allowing users to hook-in custom VADs in aeneas 2.0.0, but that will not happen any time soon.

readbeyond / aeneas

VAD options. Can I change them? #259