snakers4 / silero-vad

Silero VAD: pre-trained enterprise-grade Voice Activity Detector
MIT License
4.41k stars 431 forks source link

GPU inference #13

Closed pKrysenko closed 3 years ago

pKrysenko commented 3 years ago

Hello. Any chance to put inference on GPU? After several tries i got error that this model is quantized, but maybe you can share non-quantized version?

pKrysenko commented 3 years ago

Which model exactly?

Ah, sorry, forgot to mention. "silero-vad"

snakers4 commented 3 years ago

Ah, sorry, forgot to mention. "silero-vad"

Well, these models are very small and they were designed to run on CPU Running them on GPU will not really provide any tangible speed / throughput benefits We can of course publish the non-quantized versions of these models But this would make the repo larger and we will have to maintain 2 versions of each model in parallel (so far we tried to keep this repo as minimal as possible)

So the main question is - why?

pKrysenko commented 3 years ago

If these models have no significant difference in performance on GPU and quantized on CPU, it has no sense to publish non-quantized version of model. Just interesting, what kind of GPU did you use for performance measurements?

snakers4 commented 3 years ago

what kind of GPU did you use for performance measurements?

https://github.com/snakers4/silero-vad#performance-metrics

All speed test were run on AMD Ryzen Threadripper 3960X using only 1 thread:

We tested on 1 CPU thread We did not test on GPU

pKrysenko commented 3 years ago

Yes, i have read about CPU experiments. If it is possible, can you share for me non-quzntized version? I can make experiments on 1050ti, k80, and maybe V100. Also, i have jetson nano, maybe it will be interesting experiments for you

snakers4 commented 3 years ago

If these models have no significant difference in performance on GPU and quantized on CPU

I am not sure that we measured this quality difference directly But I believe that we measured quality on the CPU version as well and used GPU only for training Also with similar quantization techniques we compared STT performance and it was within 1 CER percentage point So I believe since this task is much easier than STT the quality gap is negligible here

I can make experiments on 1050ti, k80, and maybe V100. Also, i have jetson nano, maybe it will be interesting experiments for you

Well, since speed is not a bottleneck and quality most likely is as well I am not sure what can be achieved by running such test I am 95% confident that the model will just be IO bound and GPUs will run with 10% utilization

snakers4 commented 3 years ago

Will close this for now Please open another issue / discussion if you see valid GPU use-cases or anything else