This documents the steps of a music classifier experiment.

In short, we test three different types of machine learning classifiers (traditional, simple neural networks, and pre-trained models) on the small GTZAN Dataset - Music Genre Classification.

Classifier Types

We want to see how much better complex models perform on versus larger more complex ones. The experiment aims to ultimately show how much compute resources are needed for good classifications.

Traditional

We train and briefly tune 3 traditional machine learning models: random forest, support vector machines, and k-nearest neighbors. Then measure the performance based on their macro F-1 scores. We also review how each category classification when using confusion matrix.

Neural networks

We then train three simple neural networks: feed-forward (mlp), cnn, rnn. This is mainly to establish a lower limit for performance of neural networks. Minimal fine tuning should be done here to save resources and maintain the baseline score.

Pretrained models

There exists many open-source audio/music classifiers. We will select one publicly available on HuggingFace. When choosing a model here, it is important to have a reproducible preprocessing step. Once this preprocessing is set, then we can re-run the previous experiments, but with the same format of data as used here, for a fair comparison.

pmhalvor / public-data

[WIP] Add music genre data #3

Classifier Types

Traditional

Neural networks

Pretrained models