I found a possible bug in align_matches function of Dejavu class. (dejavu/dejavu/init.py)
In the line 208, the function assigns dedup_hashes[song_id] to hashes_matched.
dedup_hashes[song_id] is a number of matched hashes of a song, which does not consider offset difference.
As I remember, python2 version of this project considered both song id and offset difference so that hashes_matched contains only hashes with same song id and same offset difference.
I thought this could be an intended change but if you don't consider offset difference, hashes_matched could exceed quried_hashes and therefore, INPUT_CONFIDENCE could exceed 1.
Since rows in the database (I only checked MySQL) are only restricted to have a unique pair (hash, song_id, offset), one of queried hashes can be matched multiple rows in the database.
For example, consider the case when there are (hash1, song_id1, offset1) and (hash1, song_id1, offset2) in the database and you query (hash1).
Same hashes in different offset exist and when I change the default overlap_ratio to 0.9, I could see hashes_matched could exceed quried_hashes.
Hi.
I found a possible bug in
align_matches
function of Dejavu class. (dejavu/dejavu/init.py)In the line 208, the function assigns
dedup_hashes[song_id]
tohashes_matched
.dedup_hashes[song_id]
is a number of matched hashes of a song, which does not consider offset difference.As I remember, python2 version of this project considered both song id and offset difference so that
hashes_matched
contains only hashes with same song id and same offset difference.I thought this could be an intended change but if you don't consider offset difference,
hashes_matched
could exceedquried_hashes
and therefore,INPUT_CONFIDENCE
could exceed 1.Since rows in the database (I only checked MySQL) are only restricted to have a unique pair
(hash, song_id, offset)
, one of queried hashes can be matched multiple rows in the database.For example, consider the case when there are
(hash1, song_id1, offset1)
and(hash1, song_id1, offset2)
in the database and you query(hash1)
.Same hashes in different offset exist and when I change the default
overlap_ratio
to 0.9, I could seehashes_matched
could exceedquried_hashes
.