worldveil / dejavu

Audio fingerprinting and recognition in Python
MIT License
6.45k stars 1.44k forks source link

Possible bug in align_matches function #242

Open nykim11 opened 4 years ago

nykim11 commented 4 years ago

Hi.

I found a possible bug in align_matches function of Dejavu class. (dejavu/dejavu/init.py)

In the line 208, the function assigns dedup_hashes[song_id] to hashes_matched.

dedup_hashes[song_id] is a number of matched hashes of a song, which does not consider offset difference.

As I remember, python2 version of this project considered both song id and offset difference so that hashes_matched contains only hashes with same song id and same offset difference.

I thought this could be an intended change but if you don't consider offset difference, hashes_matched could exceed quried_hashes and therefore, INPUT_CONFIDENCE could exceed 1.

Since rows in the database (I only checked MySQL) are only restricted to have a unique pair (hash, song_id, offset), one of queried hashes can be matched multiple rows in the database.

For example, consider the case when there are (hash1, song_id1, offset1) and (hash1, song_id1, offset2) in the database and you query (hash1).

Same hashes in different offset exist and when I change the default overlap_ratio to 0.9, I could see hashes_matched could exceed quried_hashes.

raedatoui commented 4 years ago

I can confirm that I have seen this bug and input confidence > 1