Closed redouane-dziri closed 4 years ago
From the reference paper, to do the above:
Do you have any tools/packages that we should use? I have seen that you can split audioFile with a package called AudioSegment, and mel maps can be done using librosa I think (melspectogram)
Have not looked into that at all yet, maybe look in the paper if they mention any packages but nothing comes to mind. I have seen usage of librosa
for spectrograms a while back so yes probably the one we'll use for spectrograms + melmaps :))
I couldn't find any in the paper, so we will have to choose!
Let's try librosa
for the maps and scikit-image
for GLCMs
Above merge:
A couple of issues with the feature engineering that will probably hinder our ability to reproduce results from the paper:
I'll try to find some litterature about the GLCMs for music since most of our current issues come from our lack of understanding of those matrices!
@redouane-dziri, quick question about the TODO list above: what do you mean by "check all computed maps at each step"? Is is like a tensor shape check? And do you want to implement some functions to test the feature engineering, or just manually checking them before we move on to the training phase?
We might have to agree on the way the imports are done for the .py files. There are some unconsistencies between running it with a notebook or directly running the files, so we need to choose the running method for the files to adapt the imports to it
I've pushed some changes:
Also, I have assumed that the output that we will write to the bucket is a big json, correct me if i'm wrong
Finished the python script to preprocess the full data (feature_engineering/preprocess_full_data.py). I have tested it (with break in the blobs for loop), and should be running fine. We need to set up an instance to do that because it is very greedy in terms of memory. feature_engineering/preprocess_full_data.py:
Notes:
Just finished with the preprocess_full_data.py script. It does end-to-end data preprocessing, and saves the result locally and on Google storage (in json format).
One file (like the file for mel map with angle=45) is 2.76 GB which is pretty heavy. The whole process takes around an hour.
Notes:
We forgot about the time-MFCC thing mentioned in the paper as well, yet another pre-processing pipeline to compare to. Added to the checklist.
The exploration confirms that it's the first bucket (very low dB) that we should drop in GLCM.
A little issue I took care of: three of the shorter tracks produced only 13 pieces instead of 14 when passed through the short-term pieces part of the pipeline. Now they will produce 14 - by padding with 0 until we reach the minimum size for them to produce 14 pieces (only a few 100s points out of ~660,000 so shouldn't be a problem, and each track is from a different genre, so not too much leaking on that end).
I wasn't sure about the quantization and the maps but now I am confident we're doing it right-ish. The paper mentions quantizing into 16 levels, which seemed arbitrary to us, non-experts. By converting the maps to decibels, all values belong in the range [-80, 0]
dB which is nicely divisible by 16 and produces buckets of 5 dB for quantization, comforting the hypothesis that the maps need to be pre-processed from amplitude realm to dB :) Re-wrote the quantization as a consequence, making sure edge cases were taken care of, and all maps were mapped into buckets from 1 to 16.
Just pushed a commit for implementing the MFCC, some notes:
I corrected generate_glcm
: it was outputing (256x256) gray level maps instead of (16x16). Also
corrected generate_glcms_from_dict
as it was generating glcms only with angle 0 all the time, due to some tricky variable replacement sh*t - that took a while to figure out ^^. I also had to modify generate_MFCC_from_dict
, somehow it wasn't outputting the 30x40x50
arrays we were expecting. Added drop_first_glcm_level_from_dict
in the pipeline as well.
All there should be left to do is to run the pipeline on the full data and we'll be done with this feature extraction preliminary step.
@redouane-dziri, I think there is one last step of preprocessing/piping to do which would be to have a function that would extract the data, and format it so that it can be fed to our model. It is pretty straight-forward for the most part. The only thing that is gonna require more work is the i-GLCM which combines all of the angles, and this will require some data engineering because of the data format we chose. I think it might be best to have a json for this as well (and not having to go through the process every time), I'll work on that!
So I finished this:
Few notes:
train
/test
with values a list of triplets(file_name, np.array, genre)
(file_name, np.array, piece_id, genre)
withpiece_id
between 0 and 13. let's call this dictraw_segments
raw_segments
and replace thenp.array
s by the mel maps of thenp.array
sraw_segments
and replace thenp.array
s by the spectrogram of thenp.array
sraw_segments
(will be the one of spectrograms or mel maps) and return the GLCM (again the only change is to thenp.array
s), andangle
argument anddistance
argument(0, 45, 90, 135)
andd=1
(0, 45, 90, 135)
andd=1