Comparing short audio files

@galarlo Thanks for your interest in our work.

Yes, maybe it is possible. I would start with the default setup first and train the model on the train-set of this repo. Reducing the segment length or increasing the dimension can improve performance, but at the scale of your dataset it doesn't seem necessary. Default segment length of 1 sec can be used for 0.5 seconds input by simple zero-padding.

This repo performs a segment-level search, whereas your scenario is a file-level search. Modifications are needed on the post-processing side. Current search method outputs Top@K list of matching segments for each input first, and then within Top@C candidates it produces a list by match-ranking. Since it does not store 'segment ID'-to-'file ID' pairs info, you may need to construct the info to produce a file-match ranking.

W don't have any threshold parameters, which is directly related to FP and FN. However, the number of segment-search output K, and the number of candidates C are somewhat related to FP and FN. https://github.com/mimbres/neural-audio-fp/blob/058d812df3787a7e000c6f595e200fd2e15ee348/eval/eval_faiss.py#L88 https://github.com/mimbres/neural-audio-fp/blob/058d812df3787a7e000c6f595e200fd2e15ee348/eval/eval_faiss.py#L232 K=20 and C=10 by default, and increasing K and C will get less FN. Another issue is that your scenario allows various input lengths while current method uses fixed-lengths for each search. You may need some ideas to summarize various input length results into the final estimate.

mimbres / neural-audio-fp

Comparing short audio files #36