Closed MohammedShokr closed 3 days ago
hi, it's known limitation. GPUs are the fastest with one process only. Sequential processing is faster than concurrent. Current Whisper-Streaming is intended for one client at a time. Batching -- #42 , could help, but still, there will be slow down. Refer to #42.
Hi @Gldkslfmsd, thanks for you reply. so the current Whisper-Streaming cannot be used in production applications with many users? is it just a POC for the streaming?
yes, it's demo, POC. Not for many users concurrently.
Issue
I implemented a WebSocket-based version of the
whisper_online_server
to handle audio streams from clients over WebSocket connections. The implementation works as expected when a single client is streaming; however, when two clients stream simultaneously, significant issues arise:Troubleshooting Attempts
I’ve tried both of the following approaches, but neither resolved the issue:
Code
Initialize the model once and share it among all clients
model = FasterWhisperASR(lan="ar", modelsize="large-v2", compute_type="float16", device="cuda") model.use_vad()
Lock to ensure thread-safe access to the model
model_lock = threading.Lock()
def process_audio_sync(online_asr_processor, audio): """Synchronous function to process audio using the ASR model.""" online_asr_processor.insert_audio_chunk(audio)
Lock the model during the critical section
async def process_audio(websocket: websockets.WebSocketServerProtocol, path):
Create a per-client processor using the shared model
async def main():
Disable the server's keepalive pings to prevent timeouts
if name == "main": asyncio.run(main())