Question: best configuration for small sound files

hristo-vrigazov commented 5 years ago

Let's say I want to recognize home sounds, e.g

Unlocking / Locking a door
Opening / Closing a window

and so on. Those sounds last something like 0.5-1 seconds. What is the best configuration in fingerprint.py for such recognition? The defaults seem to work well for big songs.

alexanderkladov commented 5 years ago

I'd be interested to find out as well. So far I've played with configs and achieved slightly better results, but it's still hit or miss. Sometimes it's spot on and sometimes it's so far off that it seems like it's just picking matches at random.

Here is what I set in fingerprint.py:

DEFAULT_FS = 44100
DEFAULT_WINDOW_SIZE = 1024
DEFAULT_OVERLAP_RATIO = 0.9
DEFAULT_FAN_VALUE = 15
DEFAULT_AMP_MIN = 5
PEAK_NEIGHBORHOOD_SIZE = 15
MIN_HASH_TIME_DELTA = 0
MAX_HASH_TIME_DELTA = 150
FINGERPRINT_REDUCTION = 0

Based on this comment, if I understand correctly, what controls the snippet length is DEFAULT_FS, DEFAULT_WINDOW_SIZE & DEFAULT_OVERLAP_RATIO. So the code above would check the sources every 0.02089795918s or ~21ms (1 sec / 44100 1024 0.9). So a 500ms section of the song has lots of slots that could be matched. That should be sufficient, but for some reason it's not, at least not for all cases.

Also, be prepared for the database to be huge and processing times to be very, very long. An average length song will take roughly 100mb of HDD space and to process 400 songs I had to leave my machine on overnight. Matching is still very fast though.

Anyone else knows how can you improve frequency/accuracy here?

Anwarvic commented 5 years ago

In my case, when I set the FINGERPRINT_REDUCTION = 0, dejavu throws this error:

MySQLdb._exceptions.ProgrammingError: 
(1064, "You have an error in your SQL syntax;
check the manual that corresponds to your MySQL server version for the right syntax to use near ')' at line 1")

So, if this is your case, reset the value of FINGERPRINT_REDUCTION

denis-stepanov commented 1 year ago

For information, my interest was in tracks of 3 seconds long; I came up with this:

CONNECTIVITY_MASK = 2
DEFAULT_FS = 44100
DEFAULT_WINDOW_SIZE = 1024      # was 4096
DEFAULT_OVERLAP_RATIO = 0.75    # was 0.5
DEFAULT_FAN_VALUE = 15          # was 5
DEFAULT_AMP_MIN = 10
PEAK_NEIGHBORHOOD_SIZE = 10
MIN_HASH_TIME_DELTA = 0
MAX_HASH_TIME_DELTA = 200
PEAK_SORT = True
FINGERPRINT_REDUCTION = 20

This results in roughly five times more fingerprints. At "fingerprinted confidence" of 50% the difference between old and new settings is 45% success rate vs 85% success rate. No significant difference in recognition time observed.

worldveil / dejavu

Question: best configuration for small sound files #173