redouane-dziri / deep-music-classification

Classify music into genres using GLCM on mel-maps and CNNs
0 stars 0 forks source link

(2) Pre-process Data #2

Closed redouane-dziri closed 4 years ago

redouane-dziri commented 4 years ago
redouane-dziri commented 4 years ago

From the reference paper, to do the above:

arnaudstiegler commented 4 years ago

Do you have any tools/packages that we should use? I have seen that you can split audioFile with a package called AudioSegment, and mel maps can be done using librosa I think (melspectogram)

redouane-dziri commented 4 years ago

Have not looked into that at all yet, maybe look in the paper if they mention any packages but nothing comes to mind. I have seen usage of librosa for spectrograms a while back so yes probably the one we'll use for spectrograms + melmaps :))

arnaudstiegler commented 4 years ago

I couldn't find any in the paper, so we will have to choose!

redouane-dziri commented 4 years ago

Let's try librosa for the maps and scikit-image for GLCMs

arnaudstiegler commented 4 years ago

Above merge:

redouane-dziri commented 4 years ago

A couple of issues with the feature engineering that will probably hinder our ability to reproduce results from the paper:

arnaudstiegler commented 4 years ago

I'll try to find some litterature about the GLCMs for music since most of our current issues come from our lack of understanding of those matrices!

@redouane-dziri, quick question about the TODO list above: what do you mean by "check all computed maps at each step"? Is is like a tensor shape check? And do you want to implement some functions to test the feature engineering, or just manually checking them before we move on to the training phase?

arnaudstiegler commented 4 years ago

We might have to agree on the way the imports are done for the .py files. There are some unconsistencies between running it with a notebook or directly running the files, so we need to choose the running method for the files to adapt the imports to it

arnaudstiegler commented 4 years ago

I've pushed some changes:

Also, I have assumed that the output that we will write to the bucket is a big json, correct me if i'm wrong

arnaudstiegler commented 4 years ago

Finished the python script to preprocess the full data (feature_engineering/preprocess_full_data.py). I have tested it (with break in the blobs for loop), and should be running fine. We need to set up an instance to do that because it is very greedy in terms of memory. feature_engineering/preprocess_full_data.py:

Notes:

arnaudstiegler commented 4 years ago

Just finished with the preprocess_full_data.py script. It does end-to-end data preprocessing, and saves the result locally and on Google storage (in json format).

One file (like the file for mel map with angle=45) is 2.76 GB which is pretty heavy. The whole process takes around an hour.

Notes:

redouane-dziri commented 4 years ago

We forgot about the time-MFCC thing mentioned in the paper as well, yet another pre-processing pipeline to compare to. Added to the checklist.

redouane-dziri commented 4 years ago

The exploration confirms that it's the first bucket (very low dB) that we should drop in GLCM.

redouane-dziri commented 4 years ago

A little issue I took care of: three of the shorter tracks produced only 13 pieces instead of 14 when passed through the short-term pieces part of the pipeline. Now they will produce 14 - by padding with 0 until we reach the minimum size for them to produce 14 pieces (only a few 100s points out of ~660,000 so shouldn't be a problem, and each track is from a different genre, so not too much leaking on that end).

redouane-dziri commented 4 years ago

I wasn't sure about the quantization and the maps but now I am confident we're doing it right-ish. The paper mentions quantizing into 16 levels, which seemed arbitrary to us, non-experts. By converting the maps to decibels, all values belong in the range [-80, 0] dB which is nicely divisible by 16 and produces buckets of 5 dB for quantization, comforting the hypothesis that the maps need to be pre-processed from amplitude realm to dB :) Re-wrote the quantization as a consequence, making sure edge cases were taken care of, and all maps were mapped into buckets from 1 to 16.

arnaudstiegler commented 4 years ago

Just pushed a commit for implementing the MFCC, some notes:

redouane-dziri commented 4 years ago

I corrected generate_glcm: it was outputing (256x256) gray level maps instead of (16x16). Also corrected generate_glcms_from_dict as it was generating glcms only with angle 0 all the time, due to some tricky variable replacement sh*t - that took a while to figure out ^^. I also had to modify generate_MFCC_from_dict, somehow it wasn't outputting the 30x40x50 arrays we were expecting. Added drop_first_glcm_level_from_dict in the pipeline as well.

redouane-dziri commented 4 years ago

All there should be left to do is to run the pipeline on the full data and we'll be done with this feature extraction preliminary step.

arnaudstiegler commented 4 years ago

@redouane-dziri, I think there is one last step of preprocessing/piping to do which would be to have a function that would extract the data, and format it so that it can be fed to our model. It is pretty straight-forward for the most part. The only thing that is gonna require more work is the i-GLCM which combines all of the angles, and this will require some data engineering because of the data format we chose. I think it might be best to have a json for this as well (and not having to go through the process every time), I'll work on that!

arnaudstiegler commented 4 years ago

So I finished this:

Few notes: