mimbres / neural-audio-fp

https://mimbres.github.io/neural-audio-fp
MIT License
175 stars 25 forks source link

Questions Regarding Custom Data Testing in Audio Fragment Identification #41

Closed devkya closed 7 months ago

devkya commented 8 months ago

Hello, I found this paper quite interesting and encountered some issues during the testing process with my custom data. I apologize if my questions seem naive, as I lack some knowledge in this area.

In my case, I don't need an algorithm to identify a specific audio from various songs using audio fragments. I have a single audio file (2-3 hours) and need to find where in this file a certain audio fragment (3-5 seconds) begins (I understand from the issue check that I need to customize the process for obtaining the start timestamp).

  1. Is this code suitable for such a scenario?

  2. I trained using the provided dataset mini. Then I used the command python run.py generate --source CUSTOM_SOURCE_ROOT_DIR --output FP_OUTPUT_DIR --skip_dummy to generate fingerprints for my custom data, which is an audio file. Afterward, I wanted to evaluate a short audio fragment (3-5 second wav file) but wasn't sure how to proceed. Also, is this a meaningful process?

  3. Should my custom audio data also be included in the training?

Thank you.

mimbres commented 8 months ago

@devkya Sorry for the late reply😅

  1. Yes, if:

    • your 3-hour-long audio has moderately unique segments,
    • or you want to search all the segments similar to query.
  2. As described in Fingerprint Generation, you should prepare a source directory that contains subfolders named test_query and test_db. You may put 3-hr audio in test_db, and some slices of queries in test_query. In fact, you won't need to slice them by yourself. Fingerprints will be sliced by segments anyway... Then follow the description about Search & Evaluation.

    • EDIT: If you don't assume any specific noise in queries, don't slice it and just reuse the 3-hr audio as query. This way, the frame index of queries and db will exactly match.
  3. No, unless you assume a very different type of data other than music. However, if you assume speech-only data for example, I recommend re-training using some speech-only data. If you assume a specific type of acoustic noise added to your query, you should use the similar type of noise for augmentation in training. The thing is, neural-FP mainly learns what kind of sound sources to discard in similarity search.

Hope this helps, good luck!