worldveil / dejavu

Audio fingerprinting and recognition in Python
MIT License
6.36k stars 1.43k forks source link

does this project work well on recognizing human speak? #301

Closed Simon-chai closed 8 months ago

Simon-chai commented 8 months ago

I have many audio files (e.g more than 100 thousand per day),all of them are random extracted segment (the duration is no longer than 10 second) of one file from a audio files set (let us termed it set A,and the size of it is less than 100). My requirement is to quickly find out a 10-second audio file is belong to Set A or not,and find out the exact audio file in set A it belong to. In the 'how it work' article I see 'peaks' in audio signal is an important part to build fingerprint ,and my audio set A is all pre-recorded audio so there are few noise with them. So I wonder is dejavu project suit for my requirement?

Simon-chai commented 8 months ago

image I fingerprint a audio file (let me call it file A) and then extract a random one-second segment from A and save it as audio file B ,after finsh that I use recognize function to match file B and get the below result : image i don't know why. the file B is totally come from file A,but just have 0.21 input_confidence. is that normal?

busterbeam commented 8 months ago

No, human speech is far more complex for this fingerprinting method. This is designed to detect audio signals such as music/melodies.

Then again depends what you mean. music follows patterns in both rhythm and frequency. human speech is quite more complex.

With speech you start with phonetic sounds (vowels/constants) which I'm sure after some research you realize doesn't really work well with this "peaks" model. And that's just the start. Not sure what your making still, but accents also affect peoples usage of phonetic sounds in words. Just because your accent pronounces a word with a certain vowel doesn't mean everyone else uses the same vowel.

Simon-chai commented 8 months ago

No, human speech is far more complex for this fingerprinting method. This is designed to detect audio signals such as music/melodies.

Then again depends what you mean. music follows patterns in both rhythm and frequency. human speech is quite more complex.

With speech you start with phonetic sounds (vowels/constants) which I'm sure after some research you realize doesn't really work well with this "peaks" model. And that's just the start. Not sure what your making still, but accents also affect peoples usage of phonetic sounds in words. Just because your accent pronounces a word with a certain vowel doesn't mean everyone else uses the same vowel.

Although most of the files ( set A) I build fingerprint from is human speech,but the files (call it B ) i want to recognize is extracted from one of them(I mean them is set A,sorry I am not a native english speaker,I don't want to cause any confusion) .In such scenario,does the vowel thing still affect a lot? Because the file B is actually a sub sequence of one file of A. And in my limited test,it working well actually,but like I said my test is limited(few data),i don't know if it can work with more data. And there is another question: the picture I post is a wave from music(the upper one) and a sub sequence of the same music file (the lowwer one). It's music but the input_confidence just 0.21, is it mean that the fingerprints build from the music file and it's segment is not the same?

busterbeam commented 8 months ago

It's not a "vowel" thing it's a "phonetics" thing. Look at your audio samples as a spectrogram (this can be done in audacity), compare it with music data. Music you can see a rhythmic pattern of frequency peaks.

If you read the README carefully it does explain its usage

Simon-chai commented 8 months ago

It's not a "vowel" thing it's a "phonetics" thing. Look at your audio samples as a spectrogram (this can be done in audacity), compare it with music data. Music you can see a rhythmic pattern of frequency peaks.

If you read the README carefully it does explain its usage

thank you,maybe I should read the README more carefully