[Feature] Detect person

jwillmer commented 4 years ago

It would be most useful if we can train the system to differentiate who said something. Depending on the person we could then start or ignore a command. For instance:

a guest in the house can't reorder (buy) supplies by talking to the voice assistant
the kids can't start movies via voice assistant if the movie is not for there age
..

synesthesiam commented 4 years ago

Kaldi apparently supports this through something called "x-vectors". I'd be interested to add this, but I haven't had time to look into how to do a basic "WAV files + labels" training for classification.

BTW, the kids activating Rhasspy are why I can't really use it at home much :/

mathquis commented 4 years ago

I’ve tested Kaldi « i-vectors » for speaker identification but it needs a LOT of training data to approach a satisfactory error rate (a few hundred short WAVs per user is apparently the minimum).

The best I got with around 5 samples per user was a 24% error rate following this : http://jrmeyer.github.io/asr/2017/09/29/challenge.html

The « x-vectors » add some improvements but they still needs like hundreds of samples per user to perform correctly (like 7-8% ER)

It would be pretty awesome to achieve speaker identification though 😊

synesthesiam / rhasspy

[Feature] Detect person #193