As you can see there is an input confidence of 4, which means in average each single hash has matches 4 times here. As the file is huge it's very likely that fingerprints will match at some point (at least once), which distorts the ranking.
In this line there is an argument that is being completely ignored, which is aligned_matches. I think that aligned_matches should play a major role for the ranking.
Let's say I have the following original tracks:
Track A: 5 hours Track B: 3 minutes
Now I'm trying to match 10 seconds of audio from Track B. The current ranking algorithm will now favour Track A:
{'song_id': 144, 'song_name': 'TrackA.wav', 'input_total_hashes': 406, 'fingerprinted_hashes_in_db': 1, 'hashes_matched_in_input': 1621, 'input_confidence': 3.99, 'fingerprinted_confidence': 1621.0, 'offset': 719479, 'offset_seconds': 33412.5395, 'file_sha1': 'A64696103620CAD306B320F64CED8749033B84F9', 'length': 11543}
As you can see there is an input confidence of
4
, which means in average each single hash has matches 4 times here. As the file is huge it's very likely that fingerprints will match at some point (at least once), which distorts the ranking.Suggestion:
https://github.com/worldveil/dejavu/blob/e56a4a221ad204654a191d217f92aebf3f058b62/dejavu/__init__.py#L197
In this line there is an argument that is being completely ignored, which is
aligned_matches
. I think thataligned_matches
should play a major role for the ranking.