Open premingiet opened 5 years ago
If you know some Python it should be easy to change vad_extract.py
that it suits your needs. As output of the network you have a numpy array labels
, which assigns 0 (=noise) or 1 (=speech) to each frame. Simply iterate over the values and return indices where the values change from 0 to 1 (= onset) or 1 to 0 (= offset).
I mostly work on C, C++ and c#. I guess i need to learn python, and then i will give that a go. Thanks.
instead of getting two separate files of voice and noise, can i just get only timing when there is noise and when there is voice along with values ? i just want to get time in the commandline. How to do that ?
like in do_vad_live.cmd we get xml output in the udp, but i dont want to use socket. i just need the timing when there is voice and when there is noise along with values. please help me out.