Closed pKrysenko closed 3 years ago
Which model exactly?
Ah, sorry, forgot to mention. "silero-vad"
Ah, sorry, forgot to mention. "silero-vad"
Well, these models are very small and they were designed to run on CPU Running them on GPU will not really provide any tangible speed / throughput benefits We can of course publish the non-quantized versions of these models But this would make the repo larger and we will have to maintain 2 versions of each model in parallel (so far we tried to keep this repo as minimal as possible)
So the main question is - why?
If these models have no significant difference in performance on GPU and quantized on CPU, it has no sense to publish non-quantized version of model. Just interesting, what kind of GPU did you use for performance measurements?
what kind of GPU did you use for performance measurements?
https://github.com/snakers4/silero-vad#performance-metrics
All speed test were run on AMD Ryzen Threadripper 3960X using only 1 thread:
We tested on 1 CPU thread We did not test on GPU
Yes, i have read about CPU experiments. If it is possible, can you share for me non-quzntized version? I can make experiments on 1050ti, k80, and maybe V100. Also, i have jetson nano, maybe it will be interesting experiments for you
If these models have no significant difference in performance on GPU and quantized on CPU
I am not sure that we measured this quality difference directly But I believe that we measured quality on the CPU version as well and used GPU only for training Also with similar quantization techniques we compared STT performance and it was within 1 CER percentage point So I believe since this task is much easier than STT the quality gap is negligible here
I can make experiments on 1050ti, k80, and maybe V100. Also, i have jetson nano, maybe it will be interesting experiments for you
Well, since speed is not a bottleneck and quality most likely is as well I am not sure what can be achieved by running such test I am 95% confident that the model will just be IO bound and GPUs will run with 10% utilization
Will close this for now Please open another issue / discussion if you see valid GPU use-cases or anything else
Hello. Any chance to put inference on GPU? After several tries i got error that this model is quantized, but maybe you can share non-quantized version?