protyposis / Aurio

Audio Fingerprinting & Retrieval for .NET
GNU Affero General Public License v3.0
140 stars 28 forks source link

General capabilities / detection #9

Closed alejandrojapkin closed 3 years ago

alejandrojapkin commented 6 years ago
  1. The library seems heavily focused on methods for music recognition. Is is suitable to do fingerprinting on other more complex sound sources?
  2. What is the process of fingerprinting? It seems you do a simple FFT and then find peaks. Is that the case?
protyposis commented 6 years ago

Hi,

alejandrojapkin commented 6 years ago
  1. Music is usually composed of instruments with distinctive timbres but most importantly a specific rhythm pattern which can easily decomposed into peak distances over a spectrogram. The same doesn't apply for something like distinguishing a blender from a beater in a noisy kitchen.
  2. Looking for peaks with FFT is not the same as STFT or CWT. I know that some of the algorithms in your list might work if implemented correctly, I'm just wondering if you tested something like this before. I'm choosing between three libraries and it'd be nice to have some validation before diving in.
protyposis commented 6 years ago
  1. As long as the audio is changing it should be possible to find peaks, and the algorithms are customizable by many parameters that can be adjusted to various use-cases. I have never tried it with such sounds myself though. I recommend building AudioAlign, throwing two test files in there, and seeing what happens. Generally, fingerprinting is designed to find recording of the same source, not similar sources, i.e. two different recordings of the same blender at the same time should match, while 2 recordings of the same blender at different times might not (since the blender isn't deterministic in always producing the exact same sound)
    1. When talking about FFT in fingerprinting, it's basically always STFT, because we want to analyze how the audio changes over time and use these clues to generate patterns to compare. Just taking a simple FFT of a whole input file wouldn't really work as we only get the average spectrum of the input.