kahst / AcousticEventDetection

Source code complementing our paper for acoustic event classification using convolutional neural networks.
MIT License
65 stars 28 forks source link
acoustic-event-detection acoustic-features acoustic-scenes audio convolutional-neural-networks spectrogram

Acoustic Event Classification Using Convolutional Neural Networks

By Stefan Kahl, Hussein Hussein, Etienne Fabian, Jan Schloßhauer, Danny Kowerko, and Maximilian Eibl

Introduction

This code repo complements our submission to the INFORMATIK 2017 Workshop WS34. This is a refined version of our original code described in the paper. We added comments, removed some of the boilerplate code and added testing functionality. If you have any questions or problems running the scripts, don't hesitate to contact us.

Contact: Stefan Kahl, Technische Universität Chemnitz, Media Informatics

E-Mail: stefan.kahl@informatik.tu-chemnitz.de

This project is licensed under the terms of the MIT license.

Please cite the paper in your publications if it helps your research.

You can download the submission here: 2017_INFORMATIK_AED_CNN.pdf (Unpublished draft version)

Installation

This is a Thenao/Lasagne implementation in Python for the classification of acoustic events based on deep features. This code is tested using Ubuntu 14.04 LTS but should work with other distributions as well.

First, you need to install Python 2.7 and the CUDA-Toolkit for GPU acceleration. After that, you can clone the project and run the Python package tool PIP to install most of the relevant dependencies:

git clone https://github.com/kahst/AcousticEventDetection.git
cd AcousticEventDetection
sudo pip install –r requirements.txt

We use OpenCV for image processing; you can install the cv2 package for Python running this command:

sudo apt-get install python-opencv

Finally, you need to install Theano and Lasagne:

sudo pip install -r https://raw.githubusercontent.com/Lasagne/Lasagne/master/requirements.txt
sudo pip install https://github.com/Lasagne/Lasagne/archive/master.zip

You should follow the Lasagne installation instructions for more details: http://lasagne.readthedocs.io/en/latest/user/installation.html

Training

In order to train a model based on your own dataset or any other publicly available dataset (e.g. UrbanSound8K) you need to follow some simple steps: First, you need to organize your dataset with subfolders as class labels. Secondly, you need to extract spectrograms from all audio files using the script AED_spec.py. After that, you are ready to train your model. Finally, you can either evaluate a model using AED_eval.py or make predictions for any sound file using AED_test.py.

Dataset

The training script uses subfolders as class names and you should provide following directory structure:

dataset   
¦
+---event1
¦   ¦   file011.wav
¦   ¦   file012.wav
¦   ¦   ...
¦   
+---event2
¦   ¦   file021.wav
¦   ¦   file022.wav
¦   ¦   ...
¦    
+---...

Extracting Spectrograms

We decided to use magnitude spectrograms with a resolution of 512x256 pixels, which represent three-second chunks of audio signal. You can generate spectrograms for your sorted dataset with the script AED_spec.py. You can switch to different settings for the spectrograms by editing the file.

Extracting spectrograms might take a while. Eventually, you should end up with a directory containing subfolders named after acoustic events, which we will use as class names during training.

Training a Model

You can train your own model using either publicly available training data or your own sound recordings. All you need are spectrograms of the recordings. Before training, you should review the following settings, which you can find in the AED_train.py file:

There are a lot more options - most should be self-explanatory. If you have any questions regarding the settings or the training process, feel free to contact us.

Note: In order to keep results reproducible with fixed random seeds, you need to update your .theanorc file with the following lines:

[dnn.conv]
algo_bwd_filer=deterministic
algo_bwd_data=deterministic

Depending on your GPU, training might take while...

Evaluation

After training, you can test models and evaluate them on your local validation split. Therefore, you need to adjust the settings in AED_eval.py to match your task. The most important settings are:

Note: Test data should be organized as training data, subfolders as class names. Feel free to use different ground truth annotations; all you need to do is edit the script accordingly.

Testing

If you want to make predictions for a single, unlabeled wav-file, you can call the script AED_test.py via the command shell. We provided some example files in the dataset folder. You can use this script as is, no training required. Simply follow these steps:

1. Download pre-trained model:

sh model/fetch_model.sh

2. Execute script:

python AED_test.py --filenames 'dataset/schreien_scream.wav' --modelname 'AED_Example_Run_model.pkl' --overlap 4 --results 5 --confidence 0.01

If everything goes well, you should see an output just like this:

HANDLING IMPORTS... DONE!
IMPORTING MODEL... DONE!
COMPILING THEANO TEST FUNCTION... DONE! ( 2 s )
TESTING: dataset/schreien_scream.wav
SAMPLE RATE: 44100 TOP PREDICTION(S):
    schreien 99 %
PREDICTION FOR 4 SPECS TOOK 57 ms ( 14 ms/spec ) 

Note: You do not need to specify values for overlap, results and confidence – those are optional. You can define a list of wav-files for prediction. To do so, run the script using --filenames ['file1.wav', file2.wav', ...].

This repo might not suffice for real-world applications, but you should be able to adapt the testing script to your specific needs.

We will keep this repo updated and will provide more testing functionality in the future.