mimbres / neural-audio-fp

https://mimbres.github.io/neural-audio-fp
MIT License
175 stars 25 forks source link

Comparing short audio files #36

Closed galarlo closed 1 year ago

galarlo commented 1 year ago

Hi, I'm interested in finding near-duplicate audio files. My dataset is about 3000 thousands short audio files, between 0.5 seconds to 5 seconds. Unlike Shazam, both the "target" audio (i.e. the songs in Shazam's case) and the user input are short, and both might contain noise.

Can this library help? If so, are there any recommendations for tuning parameters?

N.B - if a file is matched to multiple other files, it's fine - I have a less efficient algorithm that can verify which match is correct. In other words, I can handle some amount of false positives, but I don't want false negatives.

mimbres commented 1 year ago

@galarlo Thanks for your interest in our work.

Yes, maybe it is possible. I would start with the default setup first and train the model on the train-set of this repo. Reducing the segment length or increasing the dimension can improve performance, but at the scale of your dataset it doesn't seem necessary. Default segment length of 1 sec can be used for 0.5 seconds input by simple zero-padding.

This repo performs a segment-level search, whereas your scenario is a file-level search. Modifications are needed on the post-processing side. Current search method outputs Top@K list of matching segments for each input first, and then within Top@C candidates it produces a list by match-ranking. Since it does not store 'segment ID'-to-'file ID' pairs info, you may need to construct the info to produce a file-match ranking.

W don't have any threshold parameters, which is directly related to FP and FN. However, the number of segment-search output K, and the number of candidates C are somewhat related to FP and FN. https://github.com/mimbres/neural-audio-fp/blob/058d812df3787a7e000c6f595e200fd2e15ee348/eval/eval_faiss.py#L88 https://github.com/mimbres/neural-audio-fp/blob/058d812df3787a7e000c6f595e200fd2e15ee348/eval/eval_faiss.py#L232 K=20 and C=10 by default, and increasing K and C will get less FN. Another issue is that your scenario allows various input lengths while current method uses fixed-lengths for each search. You may need some ideas to summarize various input length results into the final estimate.