worldveil / dejavu

Audio fingerprinting and recognition in Python
MIT License
6.41k stars 1.43k forks source link

Speaker Recognition #11

Closed kmarekspartz closed 10 years ago

kmarekspartz commented 10 years ago

Could this system be used for speaker recognition?

worldveil commented 10 years ago

It depends what you mean. If you have labeled recordings of a speaker and want to recognize those exact recordings being played, then yes.

If however you want to recognize (1:N) or verify (1:1) a person's speech by their particular idiosyncrasies of speech, then no. dejavu works off of a fingerprinting (read: hashing) system. Like any good hashing scheme, a small perturbation of the input (in dejavu's case, timing and frequency) will cause very different fingerprints.

While very robust to noise, trying to recognize voice, which is not reliably the same timing or frequency each time, won't work. dejavu is meant for recognizing exact duplicates of previously recorded audio.

kmarekspartz commented 10 years ago

I was thinking about having a long recording of an individual repeating their name many times, and then as input having them say their name once. The fingerprinting approach may be useful there. I or a friend will try it out when we get a chance and get back to you.

Thanks!