Closed anthnyprschka closed 2 years ago
Hi, I figure you're using the medium.en model? The issue on (I'd assume) a 3060/2070 or faster is that the AI model likes to work off of a 30 second window (by default, see the '-n' and 'padding' options) and despite these GPUs being able to handle it faster than real-time, the latency/delay between processing and audio coming in is what's making it appear slow.
Try whispering --language en --model medium.en -n 90 --allow-padding with your proper/preferred mic option and see if it's faster. I believe '-n' is 160 by default (if --n is not specified), and reducing it down to say '30' can bring speedups, but also a loss of accuracy, more errors where the default flags (no -n or padding) have it transcribe with the delay, but as accurate as the audio-file in / mp3 / wav taking version.
whispering --language en --model medium.en -n 80 --allow-padding --mic 0 is faster than stock and I don't see much accuracy loss in live, but I feel below this you might have issues, so try it out and tell us how it works
Hi there, for me transcription is bit slow, I have RTX 2070 with 8gb. Is expected, do I need to ramp up hardware?