worldveil / dejavu

Audio fingerprinting and recognition in Python
MIT License
6.36k stars 1.43k forks source link

Keep getting results when tested with songs those aren't stored in database. #102

Closed fainneth closed 7 years ago

fainneth commented 7 years ago

I tried to search for songs those aren't stored in database, but I keep getting results from those (not an error message) My Tuning : FINGERPRINT_REDUCTION = 20 PEAK_SORT = True DEFAULT_OVERLAP_RATIO = 0.5 DEFAULT_FAN_VALUE = 15 DEFAULT_AMP_MIN = 10 PEAK_NEIGHBORHOOD_SIZE = 20 Where did I go wrong? And I also couldn't find any songs when I changed the peak sort to false.

worldveil commented 7 years ago

Can you be more specific? You say you are getting false positives or false negatives?

fainneth commented 7 years ago

False positive. I was trying to get an error message (sorry dejavu can't recognize this song, or something like that) when playing a random song, but instead I got false result because dejavu could regonize the song.

worldveil commented 7 years ago

That's completely normal. Dejavu is going to do it's best to find the closest matching option. In fact, it's quite likely any given two audio files share a single fingerprint.

You should just be looking at the confidence parameter of the output, that will tell you the degree to which the fingerprints aligned, and is a relative measure of confidence in the match.

fainneth commented 7 years ago

Ah I see, thank you very much. Um, about the confidence, how does dejavu count it anyway? Is it the total of the same fingerprints found from both the database and the tested audio or what?

worldveil commented 7 years ago

It's the maximum number of fingerprints that could possibly be aligned in time from the sample audio against the reference database audio (fingerprints).

For each query, Dejavu is simply computing this maximum amount per database audio sample and then selecting the database entry with the highest alignment score. In a sense, Dejavu will always give an answer, it's up to you to pick a threshold that is meaningful / accurate for you.

The number of fingerprints of course widely varies (especially by type/genre of audio) so I can't and shouldn't give any hard rules, but you'll probably find that unless there's a few thousand aligning fingerprints it may be a false positive.

fainneth commented 7 years ago

So, if I try to match a song with it's original file (not using microphone but recognize from file), if I get confidence 2000, then there are 2000 fingerprints in that song?

On Mon, Sep 26, 2016 at 6:17 AM, Will Drevo notifications@github.com wrote:

It's the maximum number of fingerprints that could possibly be aligned in time from the sample audio against the reference database audio (fingerprints).

The number of fingerprints of course widely varies (especially by type/genre of audio) so I can't and shouldn't give any hard rules, but you'll probably find that unless there's a few thousand aligning fingerprints it may be a false positive.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/worldveil/dejavu/issues/102#issuecomment-249453000, or mute the thread https://github.com/notifications/unsubscribe-auth/AVVOow3mb5YVkzu9VrTedzw0Qncqg93Pks5qtwEUgaJpZM4KDuDg .

worldveil commented 7 years ago

The algorithm is identical whether you recognize from a file or a mic.

Confidence of 2000 means 2000 SHA-1 fingerprint hashes were aligned in the audio you submitted. A song/recording can have hundreds of thousands of fingerprints (technical details here). 2k could actually fairly low, it depends on your corpus.

You'll have to play with your use case and do some testing and see what a good threshold is. It depends on length (longer tracks have more fingerprints) and content (sonically sparser things like dialog have less fingerprints).

Though if you are only dealing with voice, you could alter the DEFAULT_WINDOW_SIZE to get greater granularity in the important vocal ranges of say, 1-2kHz or adjust the DEFAULT_AMP_MIN to be more sensitive there. These and more settings are located in this file.

Keep in mind these settings are things you should experiment with, but that once you fix them, you'll need to re-fingerprint the database if you decide to change them as it will affect the fingerprint generation which must be deterministic for recognition to happen properly / accurately.

fainneth commented 7 years ago

How about offset and offset seconds? When I tried to run using microphone, I got the value of offset and offset seconds (for example offset = 150, offset second = 6.235) but when I run using exact same file, I got offset = 0, and offset second = 0,0 for every (20) files I tested. Is it normal? And why are there more than 1 hashes in one offset, like the picture bellow = database

worldveil commented 7 years ago

Completely normal.

To your first question: If you match an identical file, the offset (both the time bin and it's corresponding conversion to seconds) will naturally be zero - there is no offset. Both the reference and query audio are aligned in time, and thus, no offset needs to be applied to "match" them up.

This is the reason why Dejavu lets you specify a fingerprint_limit which controls how many seconds of the start of the audio file you actually fingerprint.

>>> from dejavu import Dejavu
>>> config = {
...     "database": {
...         "host": "127.0.0.1",
...         "user": "root",
...         "passwd": "Password123", 
...         "db": "dejavu_db",
...     },
...     "database_type" : "mysql",
...     "fingerprint_limit" : 10
... }
>>> djv = Dejavu(config)

Supposing that you are always comparing audio files that start from the beginning of the track, you don't need any more than a short period of time since the offset will always be close to zero. This saves a LOT of disk space.

To your second: And yes, you will likely see many hashes per time bin (offset). Recall that Dejavu creates a spectrogram with X axis as time (binned) and Y axis as frequency (binned as well). The setting PEAK_NEIGHBORHOOD_SIZE (default=20) controls how far (in bins) we need consecutive fingerprint peaks (cells in XY) to be apart. An offset of n corresponds to the nth time bin, so naturally if we have more than PEAK_NEIGHBORHOOD_SIZE / 2 (size includes the whole field of view for the neighborhood around a peak) frequency bins (that is, if DEFAULT_WINDOW_SIZE / 2 +1 > PEAK_NEIGHBORHOOD_SIZE / 2) then most likely you will have fingerprints with same offset.

fainneth commented 7 years ago

Oh I get it. Thank you very much, you are so kind to answer all of my quetions. I really appreciate that :)

On Sep 30, 2016 4:10 AM, "Will Drevo" notifications@github.com wrote:

Completely normal.

To your first question: If you match an identical file, the offset (both the time bin and it's corresponding conversion to seconds) will naturally be zero - there is no offset. Both the reference and query audio are aligned in time, and thus, no offset needs to be applied to "match" them up.

This is the reason why Dejavu lets you specify a fingerprint_limit which controls how many seconds of the start of the audio file you actually fingerprint.

from dejavu import Dejavu>>> config = {... "database": {... "host": "127.0.0.1",... "user": "root",... "passwd": "Password123", ... "db": "dejavu_db",... },... "database_type" : "mysql",... "fingerprint_limit" : 10... }>>> djv = Dejavu(config)

Supposing that you are always comparing audio files that start from the beginning of the track, you don't need any more than a short period of time since the offset will always be close to zero. This saves a LOT of disk space.

To your second: And yes, you will likely see many hashes per time bin (offset). Recall that Dejavu creates a spectrogram with X axis as time (binned) and Y axis as frequency (binned as well). The setting PEAK_NEIGHBORHOOD_SIZE (default=20) controls how far (in bins) we need consecutive fingerprint peaks (cells in XY) to be apart. An offset of n corresponds to the nth time bin, so naturally if we have more than PEAK_NEIGHBORHOOD_SIZE frequency bins (that is, if DEFAULT_WINDOW_SIZE / 2 +1 > PEAK_NEIGHBORHOOD_SIZE) then most likely you will have fingerprints with same offset.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/worldveil/dejavu/issues/102#issuecomment-250592455, or mute the thread https://github.com/notifications/unsubscribe-auth/AVVOo2pV3HIwhOpxaj5SfVJLc2npdhNFks5qvCldgaJpZM4KDuDg .

worldveil commented 7 years ago

No problem. Happy fingerprinting :)