Closed DieKatzchen closed 5 months ago
Yes, it is possible with two different approaches:
1) The micro_wake_word
component in ESPHome runs two models simultaneously. I have implemented this, but the code needs some clean up. I'm hoping to submit a PR to ESPHome in the next week. You should be able to run two of the current models at the same time, but then the ESP32-S3 is basically at max capacity in terms of CPU usage.
2) Train a single model with multiple wake word probability outputs. I also have this roughly working, but I haven't achieved results that are comparable to single word models (which is to be expected). I want to explore this more in the future with a different model architecture, but it requires some significant changes to the current training process. Currently, all the metrics and related calculations assume a single output indicating the wake word or not.
Approach 1 would be sufficient for my current needs, although as you say it is not ideal. I assume that with approach 2 it would be possible to know which wakeword was triggered?
Yes, with both approaches we would know which wake word is said. For example, in a model with two wake words, it would output 3 probabilities: wake word 1, wake word 2, and other noises. I imagine it will take awhile for me to get this approach working well, if I can at all.
I'm developing a new model architecture that is smaller/faster, so once I get the wake words retrained, you should be able to run three different models at once. I'm still fine-tuning this architecture however, so generating new models will take some time as well.
Would it be possible to have multiple wake words on the same device?