worldveil / dejavu

Audio fingerprinting and recognition in Python
MIT License
6.36k stars 1.43k forks source link

Questions regarding the standardization of Audio Fingerprinting #199

Open DonaldTsang opened 5 years ago

DonaldTsang commented 5 years ago
  1. Is it possible to make a coordinated effort tom standardize audio fingerprinting with other repos e.g.
  2. According to https://en.wikipedia.org/wiki/Just-noticeable_difference humans can only notice tones of 10 cents (1/10 of a semitone) or more, so is it possible to reduce tone differences to an integer to the nearest 10cents instead of a float/decimal (-535-+661, 16-bit integers shifted +535)? And from an even more extreme case we can reduce the whole key-space to just piano keys in 440Hz, thus reducing the human auditory range of 20Hz-20kHz to -54-+67 (good for 8-bit signed integers), BUT in that case microtonal music and music tuned in 432Hz might loose information compared to chromatic 440Hz. (which is why the formal proposal with -535 to +661 is better and more fine grained)
  3. According to https://www.reddit.com/r/askscience/comments/5dpu0z/what_is_the_fastest_beats_per_minute_we_can_hear/ humans can only notice beats of 1500-1800BPM or 25-30BPS or slower, so is it possible to reduce the timed difference to an integer to the nearest 40ms (or 50ms, or 20ms/25ms for higher accuracy) instead of a float? For this I am asking what are the "anchor points" (Shazam paper reference) of the software, and if it is possible to reduce the time distance between two notes into a single integer that is smaller than the expected <1 milliseconds (possibly use 20~50 millisecond intervals) to save space and database search time (credit to Adam Neely's "Fastest Music" video)
  4. According to https://en.wikipedia.org/wiki/List_of_chords most chords have less than 8 notes, so would the allowance of only 1~6 notes at a time reduce the amount of audio fingerprint complexity? For this I am asking what are the "anchor chords" of the software, and if it is necessary to add extra anchor points based on chords that has more than 3 notes.
  5. What are the interval minimum and maximum of the "Target Zone" (Shazam paper reference), is it one octave? two octaves? (in wikipedia people have considered quadruple octaves as in "interval", so that is not really useful in real life) What are the time difference maximum of the "Target Zone" (Shazam paper reference), is it one second? two seconds? (as any music slower than two seconds are "too slow to be useful", a reference to Adam Neely)