linto-ai / linto-desktoptools-hmg

GUI Tool to create, manage and test Keyword Spotting models using TF 2.0
GNU Affero General Public License v3.0
12 stars 2 forks source link

HMG Server #7

Closed StuartIanNaylor closed 3 years ago

StuartIanNaylor commented 3 years ago

This is a feature request rather than issue.

The hmg is great as it can create any models for any system running tensorflow so when it comes to hardware the models become interoperable which is great. I feel that the satellite KWS is purely a HMI like a keyboard or a mouse and that also should be interoperable but currently all systems seem to try and rubber stamp a brand. Satelites are set up as units but are a collection of networked devices (KWS Mic), (Indicator Pixel ring) and (display Magicmirror) and they should be separate devices likely communicating via MQTT. Audio data isn't a system communication protocol and shouldn't be sent by MQTT so that streams & messaging can be partitioned into different QoS and security and also not hog a lightweight message system.

Be it ESP32 to Raspberry pi there is a need for a server to be able to group satelites into zones of operation and collate devices to associate with a session.

So I am making a request as all that is needed for a KWS is for it to be able to broadcast a MQTT message of a 0-1 confidence level so that it can steam audio to a http chunk server. That it can receive a MQTT message back to stop broadcast and that is all that is required for any base system to ensure interoperability.

The idea is the server via software is extensible so that voice hardware becomes interoperable via modules and configuration of a satellite server than any hardware device. Hardware MQTT via the hardware server broker is a gateway port so that the system implementation has no effect on hardware as the hardware server will communicate to the system.

That is it and wow that is a massive step especially in conjunction with a tool like HMG as its extremely frustrating for device providers as currently they all have to be system specific.

We are currently missing the interoperable and extensible glue for voice AI hardware so that ownership and employment is extremely constrictive.

StuartIanNaylor commented 3 years ago

PS if you have a look at this https://github.com/42io/esp32_kws/issues/1

Its a DS-CNN and another thing I picked up is that tensorflow for microcontrollers currently doesn't support any RNN. The DS-CNN here looks perfect and also comes with working code as that solves how to do it with Keras as I have been scratching my head in trying to get a working CRNN but if tensorflow for microcontrollers is of any interest the problem is any RNN.

Maybe have a think about using a GRU? Maybe DS-CNN provides much wider use?

StuartIanNaylor commented 3 years ago

I did a 'Hey Marvin' dataset if of any use.

https://drive.google.com/open?id=1LFa2M_AZxoXH-PA3kTiFjamEWHBHIdaA