harvard-edge / multilingual_kws

Few-shot Keyword Spotting in Any Language and Multilingual Spoken Word Corpus
155 stars 35 forks source link

Using multilingual_kws with microphone streaming #34

Closed wesbz closed 2 years ago

wesbz commented 2 years ago

Hi! Very interesting work! I would like to know if it was possible to test this using the microphone stream as input?

wesbz commented 2 years ago

I managed to do it in the end.

Helaly96 commented 2 years ago

how did you do it?

wesbz commented 2 years ago

@Helaly96 you basically have to process your audio stream (say coming from PyAudio) to be at the right size for the model. You want to have a spectrogram (can't remember the dimensions) of fixed sized with sliding windows. So you can either have an audio buffer queue and compute the spectrogram at each time you want a prediction, or you can compute your spectrograms with your audio buffer in a smartway and only compute the part of the spectrogram that is new. Please see thie gist: https://gist.github.com/wesbz/6a2a33f751f6dd3117c10369f786a46d Let me know if you have another question.

Helaly96 commented 2 years ago

I am having some trouble with the model loading, but I understand your gist! thank you

Helaly96 commented 2 years ago

alright managed to fix the loading part. needed to convert it to tf_lite.