Closed pk97 closed 4 years ago
Hello @pk97 Your understanding is quite right. The offset difference is calculated for each matching fingerprints. Each fingerprint has its own offset which is the time at which it was found. The fingerprint from the segment of song is matched to one in the database that has exactly the same value. Then their times are substracted. So the offset difference is the difference between the time at which a fingerprint was found in the segment of song and the time its matching fingerprint was found in the actual song.
@jean72human Thanks for reply
From mic with 5 seconds we recognized: {'song_id': 1, 'song_name': 'Brad-Sucks--Total-Breakdown', 'file_sha1': '02A83F248EFDA76A46C8B2AC97798D2CE9BC1FBE', 'confidence': 32, 'offset_seconds': 36.50177, 'offset': 786}
Above is a sample output of Dejavu.
To my understanding, 'offset_seconds' means the time corresponding to a particular fingerprint of the song which got matched to the fingerprint of sample input given to Dejavu.
'Offset': It is the relative fingerprint number
Is this correct?
Note: I tried to verify this as I played the sample input using my phone and noting the time. The output of offset_seconds varies as much as by 7 seconds.
the offset seconds is actually an offset difference. It is the difference between the time corresponding to a particular fingerprint of the song and the time corresponding to its matching fingerprint of sample input given to Dejavu. Let's take this example, you have a song A in the database. Now you are recognizing a record and comparing it to the song A in the database. In that song A, between the 8th and the 15th millisecond, there are 5 fingerprints found with respective times 8, 10, 11, 14, 15 (in milliseconds). In the record, those same fingerprints were found but with times 0, 2, 3, 6, 7. These are the same values so they match however their times are not the same because the record does not start from the beginning of song A. But since it is the same song the sequence is the same therefore for each matching peak when we take the difference in offset it will be the same. So the first peak: 8-0=8 the second peak: 10-2=8 and with all the following peaks it is still 8. 8 is, by the way, the time in the song A from where the system started picking up the record. Meaning that the record starts from the 8th millisecond of song A. That value, 8 is the offset difference which is then converted to seconds and displayed as offset_seconds. Hope this example helps.
@jean72human Thanks for your explanation. You seem to understand the way this works very well. So I was wondering whether you might be able to help me understand a few things better as well:
offset
variable, correct? And since it's represented in milliseconds, we just need to divide it by 1000 to get the seconds. For example, in the sample below, the match happened 83.306 seconds into the song (or 1m 23.306s). Is that correct?
{'song_id': 211, 'song_name': 'Aerosmith - Walk This Way', 'confidence': 9, 'offset': 83306, 'offset_seconds': 1740.92539, 'file_sha1': '3BA7DAF426916E8FDB7C1CF37F64F0EBEF5F1530', 'match_time': 0.2773129940032959}
Hello @alexanderkladov
DEFAULT_FS = 44100
######################################################################
DEFAULT_WINDOW_SIZE = 4096
######################################################################
DEFAULT_OVERLAP_RATIO = 0.5
######################################################################
DEFAULT_FAN_VALUE = 15
######################################################################
DEFAULT_AMP_MIN = 10
######################################################################
PEAK_NEIGHBORHOOD_SIZE = 20
######################################################################
MIN_HASH_TIME_DELTA = 0 MAX_HASH_TIME_DELTA = 200
######################################################################
PEAK_SORT = True
######################################################################
FINGERPRINT_REDUCTION = 20`
Thanks for the write up. Unfortunately, it's not 100% clear what I need to do exactly to achieve the best results from the fingerprint.py
comments. I have experimented with various configs and have received widely different results. It's quite strange. But so far the best results were with:
DEFAULT_FS = 44100
DEFAULT_WINDOW_SIZE = 1024
DEFAULT_OVERLAP_RATIO = 0.9
DEFAULT_FAN_VALUE = 30
DEFAULT_AMP_MIN = 10
PEAK_NEIGHBORHOOD_SIZE = 15
MIN_HASH_TIME_DELTA = 0
MAX_HASH_TIME_DELTA = 200
PEAK_SORT = True
FINGERPRINT_REDUCTION = 20
- So offset, and offset-seconds are the same thing just that offset-seconds is in seconds.
How can that be? My fingerprinted tracks are anywhere from 2.5 to 8 minutes long. But offset_seconds
results could be in thousands sometimes, meaning up to 30m or more. How is that being calculated? Based on my configs above, the unit size should be 0.02089795918s or ~21ms (1 sec / 44100 1024 0.9).
Here is an example of a result I get:
{'song_id': 91, 'song_name': 'Depeche Mode - Enjoy The Silence - Single Mix', 'confidence': 28, 'offset': 61373, 'offset_seconds': 1282.57045, 'file_sha1': 'BAC931E2C1404B8C6C8A665B84746932BABA513E', 'match_time': 2.8039259910583496}
Could you please help me decode where on earth did the match happen in that song? The snippet I am feeding it is less than a second long..
Yes so it is 21 ms and when multiplied by the offset you get the same value as the offset-seconds. Now concerning the reason why the offset-seconds is so big, I don't really have a clear idea. I can only suggest you reduce the overlap ratio. The last time I tried to tweak these values I would sometimes get results that do not make sense. You can also try with the initial values.
I am new in this project and I am having a problem that the same with @alexanderkladov. The offset_second is so big while my tracks are from 2 to 3 minutes in duration. @worldveil, Can you clarify for me?
Hi Dejavu is a great project but I am unable to understand how it is able to calculate relative offset for a segment of the sound. According to documentation following formula is used:
To my understanding offset is difference in time between the actual starting of song and the starting of the segment of the song. I know something is wrong.
How relative offset is been calculated?