microsoft / webnn-developer-preview

MIT License
42 stars 10 forks source link

[Whisper Base] Reduce GPU utilization for "realtime speech" #9

Closed ibelem closed 4 months ago

ibelem commented 5 months ago

AudioMotion, a real-time graphic audio spectrum analyzer was used in the Whisper Base demo for "audio file upload" and "record" scenarios for better audio visualization effect, which is not enabled for "realtime speech" scenario (to save resources).

But in the current code of Whisper Base demo, the AudioMotionAnalyzer instance will be created after creating the ONNX Runtime Web sessions, which means it is also created for "realtime speech" scenario.

This PR will move the AudioMotionAnalyzer instance creation for "audio file upload" and "record" scenarios only, to reduce the GPU Utilization impact for "realtime speech" scenario, especially the WebNN NPU backend.

@Adele101 PTAL