Closed Simon-chai closed 8 months ago
I fingerprint a audio file (let me call it file A) and then extract a random one-second segment from A and save it as audio file B ,after finsh that I use recognize function to match file B and get the below result : i don't know why. the file B is totally come from file A,but just have 0.21 input_confidence. is that normal?
No, human speech is far more complex for this fingerprinting method. This is designed to detect audio signals such as music/melodies.
Then again depends what you mean. music follows patterns in both rhythm and frequency. human speech is quite more complex.
With speech you start with phonetic sounds (vowels/constants) which I'm sure after some research you realize doesn't really work well with this "peaks" model. And that's just the start. Not sure what your making still, but accents also affect peoples usage of phonetic sounds in words. Just because your accent pronounces a word with a certain vowel doesn't mean everyone else uses the same vowel.
No, human speech is far more complex for this fingerprinting method. This is designed to detect audio signals such as music/melodies.
Then again depends what you mean. music follows patterns in both rhythm and frequency. human speech is quite more complex.
With speech you start with phonetic sounds (vowels/constants) which I'm sure after some research you realize doesn't really work well with this "peaks" model. And that's just the start. Not sure what your making still, but accents also affect peoples usage of phonetic sounds in words. Just because your accent pronounces a word with a certain vowel doesn't mean everyone else uses the same vowel.
Although most of the files ( set A) I build fingerprint from is human speech,but the files (call it B ) i want to recognize is extracted from one of them(I mean them is set A,sorry I am not a native english speaker,I don't want to cause any confusion) .In such scenario,does the vowel thing still affect a lot? Because the file B is actually a sub sequence of one file of A. And in my limited test,it working well actually,but like I said my test is limited(few data),i don't know if it can work with more data. And there is another question: the picture I post is a wave from music(the upper one) and a sub sequence of the same music file (the lowwer one). It's music but the input_confidence just 0.21, is it mean that the fingerprints build from the music file and it's segment is not the same?
It's not a "vowel" thing it's a "phonetics" thing. Look at your audio samples as a spectrogram (this can be done in audacity), compare it with music data. Music you can see a rhythmic pattern of frequency peaks.
If you read the README carefully it does explain its usage
It's not a "vowel" thing it's a "phonetics" thing. Look at your audio samples as a spectrogram (this can be done in audacity), compare it with music data. Music you can see a rhythmic pattern of frequency peaks.
If you read the README carefully it does explain its usage
thank you,maybe I should read the README more carefully
I have many audio files (e.g more than 100 thousand per day),all of them are random extracted segment (the duration is no longer than 10 second) of one file from a audio files set (let us termed it set A,and the size of it is less than 100). My requirement is to quickly find out a 10-second audio file is belong to Set A or not,and find out the exact audio file in set A it belong to. In the 'how it work' article I see 'peaks' in audio signal is an important part to build fingerprint ,and my audio set A is all pre-recorded audio so there are few noise with them. So I wonder is dejavu project suit for my requirement?