Closed galarlo closed 11 months ago
Yes, if both your dataset and the user input contain audio signals from the same "root source" (in the Shazam example, e.g., the same version of the same song by the same artist), then it can work even when both contain noise. This library can help, but given the short dataset items and queries, you'll need a custom configuration which is tweaked to generate short fingerprints (below 0.5 seconds) and I can't provide you a specific recommendation. It also makes sense to evaluate your dataset with all 4 fingerprint algorithms to find the one which works best for your use case.
@protyposis Thanks very much for the quick response :) If it'll work, it'll solve a big problem for me.
Couple of questions:
I've started playing around with the library, and noticed that it doesn't output matches between files. How can I transform Aurio's outputs into file matches?
It also makes sense to evaluate your dataset with all 4 fingerprint algorithms to find the one which works best for your use case.
Correct me if I'm wrong, but AcoustID doesn't seem appropriate for my use-case. My use-case is similar to Shazam, in which the audio can be recorded on a phone's microphone with background noise (in e.g. a bar).
I'm suspicious of AcoustID because of the following reasons:
AudioTrack
s that reference the audio files (audioTrack.FileInfo
).Match & Align
, select the desired fingerprint, and click Find Matches
. This uses the default settings though, and as mentioned in my previous comment, you will probably have to tweak settings for your use case, e.g., by changing parameters in the DefaultProfile
of each fingerprint.@protyposis thanks Mario.
I've played with that example. From what I understand, it returns matches between sub sections of the audio files. However, I'm interested in finding similarity between whole files (when one of the files may only be a subset of the second, like in Shazam). How do I transform the subsections matches into whole file matches? I've some naive ideas about how to do it (e.g. averaging the subsections matches similarity scores, weighted by lengths), but I'd like to know if there's a better recommendation.
Thanks, very informative.
If you really want to make sure that a track matches across its entire runtime then you need to assert that there are matching fingerprints across the whole duration, e.g., for music tracks, make sure there is a match at least every 15 seconds from start to end, but permit a few gaps too (for robustness to silent or over-noisy sections). The similarity scores are mainly meant for ranking results, so better not average the raw result as bad matches will spoiling your average. Rather add a filtering step to first pick a sequence of the best ones, e.g. by cutting a music track into sections of 15 seconds and picking the best match each, and then calculate the average if needed, like you suggested. Also, all returned matches are basically considered true positives, as false positives are not returned (can be tweaked though). Keep in mind that this is a special use case and really only needed if your data contains remixed/concatenated signals. Normally a few seconds are enough to reliably identify a piece.
In your case you'll have to do that in 0.5 or 1 second intervals, and as mentioned tweak the profiles for shorter fingerprints. The default fingerprint length of the Shazam fingerprint is 0.5 to 22 seconds (might work in your case out of the box), of the Philips 8 seconds (won't work in your case ootb).
Closing due to inactivity.
Hi, I'm interested in finding near-duplicate audio files. My dataset is about 3000 thousands short audio files, between 0.5 seconds to 5 seconds. Unlike Shazam, both the "target" audio (i.e. the songs in Shazam's case) and the user input are short, and both might contain noise.
Can this library help? If so, are there any recommendations for tuning parameters?
N.B - if a file is matched to multiple other files, it's fine - I have a less efficient algorithm that can verify which match is correct. In other words, I can handle some amount of false positives, but I don't want false negatives.