shirayu / whispering

Streaming transcriber with whisper
MIT License
682 stars 53 forks source link

Setting for real-time streaming ASR? #62

Closed iwangjian closed 1 year ago

iwangjian commented 1 year ago

Hi, appreciate your excellent project! I tried running the server and the client successfully. I found that ASR responds slowly, although I set --frame to a smaller value (i.e., 100), --num_block to 80, and --vad to 0. Whether is it possible to apply your project for real-time streaming ASR? If possible, may I know how to set the parameters properly? Thank you.

shirayu commented 1 year ago

Hi. For real-time processing, ASR must be performed in less than 1 second for a 1-second interval. Real-time processing may be difficult because of the current whisper is slow in general.

Processing time is mainly determined by GPU performance and a model size. Therefore, specifying a small model like --model tiny is one way.

Another way is to use VAD, which is lighter than whisper's processing. If the VAD determines that a section is silent, it skips the whisper processing.

iwangjian commented 1 year ago

Got it, thank you very much!