Closed JLenzy closed 1 year ago
Hi,
You can preprocess the audio into a .npy
format using preprocessing.
https://github.com/minzwon/sota-music-tagging-models/blob/master/preprocessing/mtat_read.py
In this experiment, I preprocessed the audio into a .npy
format in advance because downsampling is time-consuming. But if your audio is already in the target sampling rate, you can also consider loading audio on-the-fly using librosa or essentia libraries.
Hi,
was a modified version of mtat_read used for MTG Jamendo as well? Because AudioFolder
seems to expect npy files. Or should they be included in MTG-Jamendo? (I don't remember when I downloaded the dataset, but my copy does not contain any numpy files)
EDIT: I just saw python scripts/baseline/get_npy.py run 'your_path_to_spectrogram_npy'
in the MTG-Jamendo description, is this the correct preprocessing?
Best regards Verena
Hi Verena,
There are two ways of handling it.
.npy
files like I did for the MagnaTagATune dataset. This will be the easiest choice of using this repository but you need extra space to store them.AudioFolder.get_npy
to read audio files.When I worked on this project, I needed downsampling to work with a 16kHz sampling rate, so I decided to store them into .npy
format. But this format is inefficient, so I recommend using audio files instead of .npy
.
The preprocessing script you provided is different from this repository. This repo calculates mel spectrograms on-the-fly. The .npy
files include raw audio, not mel spectrograms.
I have two goals:
In both cases I am having trouble due to the dataset formats; it seems that the scripts require a very specific format of dataset which is not really detailed in the readme. If you could provide any clarification on how our datasets should be formatted, this would be greatly appreciated! Thanks in advance.