modelscope / kws-training-suite

MIT License
73 stars 15 forks source link

Is there streaming inference demo? #5

Closed duj12 closed 8 months ago

duj12 commented 1 year ago

Hi, thanks for opensource this production-level kws project. I followed the pipeline, it works fine. But so far I can't find a streaming inference demo, both in this project and modelscope's examples. My question is: what is the input format of ./bin/SoundConnect, does it support wav chunk input? if so, what is the length limit of chunks? If you can give a realtime kws demo, that would be great!

duj12 commented 1 year ago

I just read the modelscope inference code https://github.com/modelscope/modelscope/blob/master/modelscope/pipelines/audio/kws_kwsbp_pipeline.py#L55-L57 Which means wav chunks in specific length is support, right?

bincard commented 1 year ago

Hi there, Thank you for reaching out to us. For the first question, ./bin/SoundConnect only support audio file as input. In fact, there are multiple kws models on modelscope.cn, and this kws-training-suite is just for the far-field kws model speech_dfsmn_kws_char_farfield_16k_nihaomiya. If far-field is not necessary for you, it is recommended that you take a look at this model: https://modelscope.cn/models/damo/speech_charctc_kws_phone-xiaoyun/summary . It can support streaming audio input. The code you mentioned in the second question is actually the inference code for this model. If you have any further questions or concerns, please let us know.