joerenner / sampleChop

Python code for neural network that chops audio samples for hip hop production
2 stars 1 forks source link

sampleChop

This python project uses a trained neural network to divide samples for the purpose of chopping https://en.wikipedia.org/wiki/Chopping_(sampling_technique) for the aide of hip hop production. Python 2.7, librosa 0.5.1, scikit-learn 0.19.1, numpy 1.13.3, scipy 1.0.0

Files:

Process:

I framed the problem as a binary classification sample: For each frame of an audio file, it would be positive if it was a good break point for a chop (chop: point in the song in which to break the original audio file), and negative otherwise. I spent a large amount of time chopping samples by hand and recording where in a wav file I chopped. Then, I created a script to turn the raw data (audio file, text file with times) into data features and labels. This was done by first taking the constant q transform aka cqt (https://en.wikipedia.org/wiki/Constant-Q_transform) and generating the tonnetz data (https://en.wikipedia.org/wiki/Tonnetz , for more harmonic components), and then having each frame be accompanied by the previous thirty frames and following thirty frames (for context). Since the classes are extremely unbalanced (a lot more negatives than positives), I had to use a couple upsampling techniques:

At this point, I had about 400,000 data points from about 100 audio files. Since this took a large amount of time and I hope to make a website using this data, I left the data off of github.

Next, I train a neural network on this data. After experimenting with many architectures, I decided to make the data simpler. I first tried a convolutional neural network on the cqt and appended tonnetz data. This did not work well and was computationally expensive, so I got rid of the tonnetz data and summed the cqt data for each frame (including 30 before and 30 after) for each data point. This took the data points from (61 x 54) to (61 x 1). I then trained a deep feed forward neural network on this data, and it worked better. I experimented with a few different architectures, and found one that worked well on validation audio samples.

Finally, I added some post filtering on the results of the neural network:

Greedy Diversification Algorithm: Maximal Average Diversification

Once the neural network results have been cleaned, we end of with N candidates for breaking the wav file into smaller chops. Since many songs have repetitive parts in them (chords, progressions, notes, etc), a lot of the final chops can be very similar if only judging on the probability. This can lead to the system generating 5 chops that basically sound the same and offer little variation for the artist to play with. Thus, we want diverse chops. I achieve this by introducing a greedy algorithm based on an objective I named Maximal Average Diversification, which is a small variation on Max-Sum Diversification (https://arxiv.org/pdf/1203.6397.pdf), an objective used for adding diversification to recommendations.

Basically, what Max-Sum Diversification and Maximal Average Diversification (M.A.D) aim to achieve is a balance between the relevance of an item and the diversity of the list of items as a whole. In the domain of sampleChop, this is a balance between thee probability of a chop and how different it is from the other chops.

The way the algorithm works is it takes the cleaned frames from the output of the neural network (we will call them the Candidate set, or C) and then greedily selects frames from the set that maximize the M.A.D. objective and adds them to the final set of chops, which we will call the F set. The objective has two parts: the relevance expression and the diversity expression, with a hyperparameter lambda that controls the trade-off between the two. The relevance expression is simply the probability output from the network. The diversity expression is calculated as follows:

Once the final set is obtained, the original audio file is split at each frame number, and the resulting chops are written to wav files, ready to program into an MPC.