synesthesiam / rhasspy

Rhasspy voice assistant for offline home automation
https://rhasspy.readthedocs.io
MIT License
950 stars 101 forks source link

[Feature] Detect person #193

Open jwillmer opened 4 years ago

jwillmer commented 4 years ago

It would be most useful if we can train the system to differentiate who said something. Depending on the person we could then start or ignore a command. For instance:

synesthesiam commented 4 years ago

Kaldi apparently supports this through something called "x-vectors". I'd be interested to add this, but I haven't had time to look into how to do a basic "WAV files + labels" training for classification.

BTW, the kids activating Rhasspy are why I can't really use it at home much :/

mathquis commented 4 years ago

I’ve tested Kaldi « i-vectors » for speaker identification but it needs a LOT of training data to approach a satisfactory error rate (a few hundred short WAVs per user is apparently the minimum).

The best I got with around 5 samples per user was a 24% error rate following this : http://jrmeyer.github.io/asr/2017/09/29/challenge.html

The « x-vectors » add some improvements but they still needs like hundreds of samples per user to perform correctly (like 7-8% ER)

It would be pretty awesome to achieve speaker identification though 😊