This repository contains the code for reproducing the experiments in our paper entitled Phase recovery with the Bregman divergence for audio source separation, published at the IEEE International Conference on Audio, Speech and Signal Processing (ICASSP) 2021.
After cloning or downloading this repository, you will need to get the speech and noise data to reproduce the results.
The speech data is obtained from the VoiceBank dataset available here. You should download the clean_testset_wav.zip
file, and unzip it in the data/VoiceBank/
folder.
Note that you can change the folder structure, as long as you change the path accordingly in the code.
The noise data is obtained from the DEMAND dataset available here. You should download the DLIVING_16k.zip
, SPSQUARE_16k.zip
and TBUS_16k.zip
files, and unzip them in the data/DEMAND/
folder.
Note that you can change the folder structures, as long as you change the speech and noise directory paths accordingly in the code.
Then, simply execute the prepare_data.py
script to create the noisy mixtures.
To run the experiments, you will need to first estimate the spectrograms of the sources, which is done using the pytorch implementation of the Open Unmix model trained for a speech enhancement task.
The pre-trained model for estimating the speech and noise spectrograms is available here.
You should place the .json
and .pth
files in the open_unmx/
folder. Note that you should also rename the .pth
files simply as speech.pth
and noise.pth
.
Now that you're all set, simply run the following scripts:
validation.py
will perform a grid search over the gradient step size on the validation subset to determine its optimal value for every setting.
It will also reproduce Fig. 1 from the paper.
testing.py
will run the algorithms (proposed gradient descent and MISI) on the test subset and plot the results corresponding to Fig. 2 in the paper.