rhasspy / wyoming

Peer-to-peer protocol for voice assistants
MIT License
103 stars 17 forks source link

WakeWord post-trigger verification #20

Open jhbruhn opened 2 months ago

jhbruhn commented 2 months ago

While I have not tried the new microWakeWord model yet, which might fix this issue anyways, I had the following thought:

To reduce the amount of false positives and potentially allow speaker identification and custom-verifier-esque runs while still keeping a low latency pipeline, it could be an idea to add a service which runs wakeword verification after the wakeword has initially been detected, and cancels the pipeline if it deems the wakeword to be false.

To explain further:

The satellite runs its own local wakeword model, and triggers the pipeline if it detects a wakeword. With this pipeline trigger, it also sends the last ~2 seconds of audio, which then gets fed into a model of this secondary wakeword service. That service then for example runs a custom verifier/raven speaker identification etc. If it also detects a wakeword, it does nothing. If it does not detect a wakeword, it sends a message to cancel the pipeline which has triggered the wakeword.

The idea is similar to how Echo devices do it, which have the ability to trigger and then send the wakeword audio to a cloud server which does secondary verification. The short light blink resulting from that is better than a tts-reply that something was not understood.

The only downside is that this does not allow cancelling of any activation sounds to still remain low latency.