mayhemsloth / Drum-Tabber

Automatic drum tab project using self-created data set and TensorFlow NN
6 stars 1 forks source link

Implement a form of "negative" data augmentation using the subtraction of drum stems from full song #23

Open mayhemsloth opened 2 years ago

mayhemsloth commented 2 years ago

I tend to think of data augmentation in the following way: teaching a model about the invariances of your dataset. You typically do this by transforming, or augmenting, any given example by some small (or large) amount, but not enough to change the label associated with that example. In the scenario I described, you are systematically and randomly changing the data but not enough to change the labels. The most effective data augmentations would be pushing the transformations right to the edge of the space of "still the same class" but not going over that boundary. This augmentation would teach the model the "boundaries" of the class in some high dimensional feature space. In normal, epoch-level random data augmentations, I tend to set my random transformations in such a way that I can be relatively confident that they do not cross that "boundary" of the class. Perhaps I should though.

However, there is a different way to approach this. If, instead, you could somehow remove all the things that indicated a certain class and changed NOTHING else, then you could maximally flip the labels with a minimal amount of change in the data. This is effectively the inverse pathway to the "boundary" described above. You push the example right up to the point of "this class" while still being "not this class". This method thus constructs "difficult" negative examples: to a human it might be the "hardest" types of examples to classify but they would absolutely be able to classify it into the correct class. These types of examples are exactly the best type of negative examples to learn from.

Using the Spleeter provided drum stem of a song, this type of negative example can be constructed for the training data! For example, theoretically (because I haven't check it yet in code), if you have a full song's waveform and then the drum stem from that waveform, by simply subtracting the drum stem's waveform from the full song you can derive a drumless song, which will have the full information of the rest of the instruments but without any of the drums (and thus no labels as well!). Of course in this project there is plenty of opportunity for negative examples (because drum event onsets are not constantly happening in songs), but this method would provide some of the most "difficult" negative examples possible.

This issue is resolved when: a version of this data augmentation is fully implemented in the training pipeline, either as an additional example ("channel") or as a random replacement.