Open jwillmer opened 4 years ago
Kaldi apparently supports this through something called "x-vectors". I'd be interested to add this, but I haven't had time to look into how to do a basic "WAV files + labels" training for classification.
BTW, the kids activating Rhasspy are why I can't really use it at home much :/
I’ve tested Kaldi « i-vectors » for speaker identification but it needs a LOT of training data to approach a satisfactory error rate (a few hundred short WAVs per user is apparently the minimum).
The best I got with around 5 samples per user was a 24% error rate following this : http://jrmeyer.github.io/asr/2017/09/29/challenge.html
The « x-vectors » add some improvements but they still needs like hundreds of samples per user to perform correctly (like 7-8% ER)
It would be pretty awesome to achieve speaker identification though 😊
It would be most useful if we can train the system to differentiate who said something. Depending on the person we could then start or ignore a command. For instance: