rhasspy / wyoming

Peer-to-peer protocol for voice assistants
MIT License
103 stars 17 forks source link

Training data collection and selection for wakewords #12

Open jhbruhn opened 5 months ago

jhbruhn commented 5 months ago

To collect training data for wakewords, either by using for example the raven/snowboy wakeword engines, or for custom verifiers in openwakewords, a mechanism is needed to both collect and then label training data.

The training data has to be collected in the environment the microphone is placed in, so ideally. In addition to that, wakeword engines can either be a central servce with permanently streaming satellites, or running on device, which makes the location of the training data, and knowledge of that on UI side, even more complicated.

With this, I want to kick off a discussion on how that could be implemented. Ideally, the end goal is a User Interface in Rhasspy/HomeAssistant, were previous wakeword activations can be recorded and then labeled (positive/negative and perhaps even a speaker id). This labelling is then sent to the wakeword engine service, which can use that to train for example a custom verifier. What is still unclear is where the recordings are stored. I am currently leaning towards the wakeword engine, which can stream these recordings to a UI on demand, for preview reasons.

Could there be other use cases besides wakewords, where such a mechanism is required? For example STT or TTS training for custom voices?