This repository contains the code for the paper ECOGEN: Bird Sounds Generation using Deep Learning. The paper proposes a novel method for generating bird sounds using deep learning by leveraging VQ-VAE2 network architecture. The proposed method is able to generate bird sounds that aims to increase the dataset size for bird sound classification tasks.
The dataset used in this paper is the Xeno-Canto dataset from Kaggle. The dataset can be downloaded from Part 1 and Part 2.
MOdel checkpoints can be found in the OSF Link folder.
The code is tested on Python 3.7.7 and PyTorch 1.13.1. The required packages can be installed using the following command:
git clone https://github.com/ixobert/birds-generation
cd ./birds-generation/
#Use this line for M1 series Mac
pip install -r mac-m1-requirements.txt
#Otherwise use this line
pip install -r requirements.txt
The preprocessing steps are as follows to train the ECOGEN VQ-VAE2 model:
We heavily used Hydra to manage the configuration files. The configuration files can be found in the src/configs
folder. See the Hydra documentation for more details.
The ECOGEN training code is inspired from the VQ-VAE2 implementation.
The training code can be found in the src
folder.
The code expects the dataset to be in the following format:
./birds-songs/dataset/train.txt|test.txt
The train,test and validation text files contains the path to the audio files. See below an example of a train.txt file:
birds-song/1.wav
birds-song/2.wav
birds-song/3.wav
To train the ECOGEN model, run the following command:
python ./src/train_vqvae.py dataset="xeno-canto" mode="train" lr=0.00002 nb_epochs=25000 log_frequency=1 dataset.batch_size=420 dataset.num_workers=8 run_name="ECOGEN Training on Xeno Canto" tags=[vq-vae2,xeno-canto] +gpus=[1] debug=false
You will need to update the content of configs/dataset
to point your custom dataset folder.
The current version of ECOGEN supports 2 types of augmentation, interpolation and noise. The generate_samples script outputs the generated spectrograms in the out_folder
folder as numpy files.npy
.
To generate the bird songs spectrograms, run the following command:
python ./src/generate_samples.py --data_paths="/folder_path/*.wav" --out_folder="./generated_samples" --model_path="/path/to/model.ckpt" --augmentations=noise --num_samples=3
python ./src/generate_samples.py --data_paths="/folder_path/*.wav" --out_folder="./generated_samples" --model_path="/path/to/model.ckpt" --augmentations=interpolation --num_samples=3
You can view the generated spectrograms using the following commands:
import numpy as np
import matplotlib.pyplot as plt
spec = np.load("generated_spectrogram.npy")
plt.imshow(spec)