snakers4 / silero-vad

Silero VAD: pre-trained enterprise-grade Voice Activity Detector
MIT License
3.38k stars 353 forks source link

Fix a bug in the VADIterator() which would return negative start #446

Open sobomax opened 2 months ago

sobomax commented 2 months ago

VADIterator() might return negative start position if voice happens to be detected in the very first frame. I don't have a test case to reproduce, but the logic error should be seen with an unaided eye. It basically tripped some assertions in our own code:

(InfernRTPActor pid=141495) Exception in thread Thread-5:
(InfernRTPActor pid=141495) Traceback (most recent call last):
(InfernRTPActor pid=141495)   File "/home/sobomax/miniconda3/envs/tinygrad/lib/python3.11/threading.py", line 1045, in _bootstrap_inner
(InfernRTPActor pid=141495)     self.run()
(InfernRTPActor pid=141495)   File "/home/sobomax/projects/Infernos/Cluster/InfernBatchedWorker.py", line 39, in run
(InfernRTPActor pid=141495)     self.process_batch(wis)
(InfernRTPActor pid=141495)   File "/home/sobomax/miniconda3/envs/tinygrad/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
(InfernRTPActor pid=141495)     return func(*args, **kwargs)
(InfernRTPActor pid=141495)            ^^^^^^^^^^^^^^^^^^^^^
(InfernRTPActor pid=141495)   File "/home/sobomax/projects/Infernos/Core/VAD/SileroVAD.py", line 81, in process_batch
(InfernRTPActor pid=141495)     assert poff > 0 and poff < vc.active_buffer.size(0), f'{poff=} {vc.active_buffer.size(0)=} {sd.current_sample=} {vc.active_start=}'
(InfernRTPActor pid=141495)                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(InfernRTPActor pid=141495) AssertionError: poff=1008 vc.active_buffer.size(0)=768 sd.current_sample=768 vc.active_start=-240

This is 8kHz, so -240 is the 30ms.