[Whisper Base] Reduce GPU utilization for "realtime speech"

AudioMotion, a real-time graphic audio spectrum analyzer was used in the Whisper Base demo for "audio file upload" and "record" scenarios for better audio visualization effect, which is not enabled for "realtime speech" scenario (to save resources).

But in the current code of Whisper Base demo, the AudioMotionAnalyzer instance will be created after creating the ONNX Runtime Web sessions, which means it is also created for "realtime speech" scenario.

This PR will move the AudioMotionAnalyzer instance creation for "audio file upload" and "record" scenarios only, to reduce the GPU Utilization impact for "realtime speech" scenario, especially the WebNN NPU backend.

@Adele101 PTAL

microsoft / webnn-developer-preview

[Whisper Base] Reduce GPU utilization for "realtime speech" #9