Closed kasireddygariDineshKumarReddy closed 2 years ago
@kasireddygariDineshKumarReddy
As you know, the goal of fingerprinting is to identify the source audio file, not the sound generating source. The same dogs' barking every time will differ in audio signal, and of course, they can't be positive pair in this project. In fact, your scenario is the same as standard sound event detection.
Perhaps your question is about the figure 2 in our paper. Any sample in the training batch has a chance to be anchored once. For example, the red original circle is an anchor to be compared with others in the first row. You may find the red circle in the second rows but the anchor in the row is pink circle. We compute softmax crossentropy on each row.
Lets say dog is barking in two seconds by more than 4 times,So that each time sound is similar to remaning barks in the same file.We have defined postive pairs are ORIGINAL and It's AUGMENTED REPLICAS.But here bark sounds from one dog is almost similar then how can they be differentiate with augmented or else will they also become postive pairs(REAL SAMPLES)?In pairwise similarity simply real circle (not dashed) presented more than ones in a row.What is the case at this moment?