Deep Learning for Music Information Retrieval

This repository contains slides, code and further material for the "Deep Learning for MIR" tutorial held at the 19th International Society for Music Information Retrieval Conference in Paris, France, from September 23-27, 2018.

Tutorial Web-site: http://ismir2018.ircam.fr/pages/events-tutorial-04.html

Authors / Lecturers

	Alexander Schindler is member of the Music Information Retrieval group at the Technical University since 2010 where he actively participates in research, various international projects. He holds a Ph.D on audio-visual analysis of music videos. He participates in teaching MIR, machine learning and DataScience. Alexander is currently employed as scientist at the AIT Austrian Institute of Technology where he is responsible for establishing a deep learning group. In various projects he focusses on deep-learning based audio-classification, audio event-detection and audio-similiarity retrieval tasks.	[Website], [Twitter], [LinkeIn]
	Thomas Lidy has been a researcher in music information retrieval in combination with machine learning at TU Wien since 2004. Since 2015, he has been focusing on how Deep Learning can further improve music & audio analysis, winning 3 international benchmarking contests. He is currently the Head of Machine Learning at Musimap, a company that uses Deep Learning to analyze styles, moods and emotions in the global music catalog, in order to create emotion-aware search & recommender engines that empower music supervisors to find the music for their needs and music streaming platforms to deliver the perfect playlists according to people's mood.	[Website], [Twitter]
	Sebastian Böck received his diploma degree in electrical engineering from the Technical University in Munich and his PhD in computer science from the Johannes Kepler University Linz in 2010 and 2016, respectively. He continued his research at the Austrian Research Institute for Artificial Intelligence (OFAI) and recently also joined the MIR team at the Technical University of Vienna. His main research topic is the analysis of time event series in music signals, with a strong focus on artificial neural networks.	[Google Scholar]

also visit: https://www.meetup.com/Vienna-Deep-Learning-Meetup

Abstract

Deep Learning has become state of the art in visual computing and continuously emerges into the Music Information Retrieval (MIR) and audio retrieval domain. To bring attention to this topic we provide an introductory tutorial on deep learning for MIR. Besides a general introduction to neural networks, the tutorial covers a wide range of MIR relevant deep learning approaches. Convolutional Neural Networks are currently a de-facto standard for deep learning based audio retrieval. Recurrent Neural Networks have proven to be effective in onset detection tasks such as beat or audio-event detection. Siamese Networks have shown to be effective in learning audio representations and distance functions specific for music similarity retrieval. We introduce these different neural network layer types and architectures on the basis of standard MIR tasks such as music classification, similarity estimation and onset detection. We will incorporate both academic and industrial points of view into the tutorial. The tutorial will be accompanied by a Github repository for the presented content as well as references to state of the art work and literature for further reading. This repository will remain public after the conference.

Tutorial Outline

Part 0 - Audio Processing Basics

Audio Processing in Python (Jupyter Notebook)
Preparing data and meta-data for this tutorial:

Part 1 - Audio Classification / Tagging (with CNNs)

Introduction - Convolutional Neural Networks (Slides)
Instrumental vs. Vocal Detection (Jupyter Notebook)
Genre Classification
Mood Recognition

Part 2 - Music Similarity Retrieval (with Siamese Networks)

Distance-based search on handcrafted music features (Jupyter Notebook)
Representation learning Siamese Neural Networks (Jupyter Notebook)
Optimizing representation learning
Learning music similarity from tags (Jupyter Notebook)

Part 3 - Onset and Beat Detection (with RNNs)

Recurrent Neural Networks (Slides)
Onset detection (Jupyter Notebook)
Onset detection with RNNs (Jupyter Notebook)

Tutorial Requirements

For the tutorials, we use iPython / Jupyter notebook, which allows to program and execute Python code interactively in the browser.

Viewing Only

If you do not want to install anything, you can simply view the tutorials' content in your browser, by clicking on the tutorial's filenames listed above in the GIT file listing.

The tutorials will open in your browser for viewing.

Interactive Coding

If you want to follow the tutorials by actually executing the code on your computer, please install first the pre-requisites as described below.

After that, to run the tutorials go into the ismir2018_tutorial folder and start from the command line:

jupyter notebook

Interactive Audio Listening examples within the Browser

The browser-based Jupyter notebooks contain HTML5 audio components to directly listen to predicted results such as for the task of music similarity retrieval. Almost all recent Internet Browsers prohibit direct file access due to decent security issues. Thus, the files have to be provided via the correct protocoll.

To enable the audio samples within the browser, download and extract the audio files to a directory on your computer. Open a Python Terminal and change to the mp3_full directory. Then host a simple Web-server with the following command:

python -m http.server 9999 --bind 127.0.0.1

This will server the directory via HTTP. The supplied parameters localhost and port-number are used equally in the Jupyter notebooks.

Download Prepared Datasets

Please download the following data sets for this tutorial:

MagnaTagAtune

Prepared Features and Metadata:

Part 1 & 2 Melspectograms (subset) (96MB)
Extracted Rhythm Patterns Features (23MB)
Audio-files (220MB)

These are prepared versions from the original datasets described below.

Installation of Pre-requisites

Install Python 3.x

Note: On most Mac and Linux systems Python is already pre-installed. Check with python --version on the command line whether you have Python 3.x installed.

Otherwise install Python 3.5 from https://www.python.org/downloads/release/python-350/

We recommend to install the Anaconda Python Distribution due to coverage of scientific Python libraries (most of the libs required in this tutorial are already included): https://www.anaconda.com/download/

Install Python libraries:

Mac, Linux or Windows

(on Windows leave out sudo)

Important note: If you have Python 2.x and 3.x installed in parallel, replace pip by pip3 in the following commands:

sudo pip install --upgrade jupyter

Try if you can open

jupyter notebook

on the command line.

Then download or clone the Tutorials from this GIT repository:

git clone https://github.com/slychief/ismir2018_tutorial.git

or download https://github.com/slychief/ismir2018_tutorial/archive/master.zip
unzip it and rename the folder to ismir2018_tutorial.

Install the remaining Python libraries needed:

Either by:

sudo pip install Keras tensorflow scikit-learn pandas numpy librosa matplotlib progressbar2 seaborn scipy

or, if you downloaded or cloned this repository, by:

cd ismir2018_tutorial
sudo pip install -r requirements.txt

Optional for GPU computation

If you want to train your neural networks on your GPU (which is faster, but not necessarily needed for this tutorial), you have to install the specific GPU version of Tensorflow:

sudo pip install tensorflow-gpu

and also install the following:

NVidia drivers
CUDA
cuDNN (requires registration with Nvidia)

Install Audio Decoder

In order to decode MP3 files (used in the MagnaTagAtune data set) you will need to install FFMpeg on your system.

Linux: sudo apt-get install ffmpeg
Mac: download FFMPeg for Mac: http://ffmpegmac.net and make sure ffmpeg is on PATH
Windows: download https://github.com/tuwien-musicir/rp_extract/blob/master/bin/external/win/ffmpeg.exe and make sure it is on the PATH

Credits

Some of the Tutorial slides of "Part_1_Convolutional_Neural_Networks.pdf" have been created by Jan Schlüter.

The following helper Python libraries are used in these tutorials:

The RP_extract feature extractor and content descriptors by Thomas Lidy and Alexander Schindler

The data sets we use in the tutorials are from the following sources:

MagnaTagAtune: http://mirg.city.ac.uk/codeapps/the-magnatagatune-dataset

(don't download them from there but use the prepared datasets from the two owncloud links above)

slychief / ismir2018_tutorial

readme