These scripts are referring to the paper "Components Loss for Neural Networks in Mask-Based Speech Enhancement". In this repository, we provide the source code for training the mask-based speech enhancement convolutional neural networks (CNNs) using our proposed components loss (CL), which includes both 2 components loss (2CL) and 3 components loss (3CL). The corresponding test code is also offered.
The code was written by Ziyi Xu and with the help from Ziyue Zhao and Samy Elshamy.
We propose a novel components loss (CL) for the training of neural networks for mask-based speech enhancement. During the training process, the proposed CL offers separate control over preservation of the speech component quality, suppression of the residual noise component power, and preservation of a naturally sounding residual noise component. We obtain a better and more balanced performance in almost all employed instrumental quality metrics over the baseline losses, the latter comprising the conventional mean squared error (MSE) loss function and also auditory-related loss functions, such as the perceptual evaluation of speech quality (PESQ) loss and the recently proposed perceptual weighting filter loss.
Note that in this project the clean speech signals are taken from the Grid corpus (downsampled to 16 kHz) and noise signals are taken from the ChiMe-3 database.
training_input_noisy.mat
(normalized noisy speech amplitude spectra, with zero mean and unit variance)validation_input_noisy.mat
(normalized noisy speech amplitude spectra, with zero mean and unit variance)training_pure_noise.mat
(amplitude spectra of noise component)validation_pure_noise.mat
(amplitude spectra of noise component)training_clean_speech.mat
(amplitude spectra of speech component)validation_clean_speech.mat
(amplitude spectra of speech component)
.mat
files must be stored in version 7.3
, using Matlab command save('filename.mat','variable','-v7.3')
to enable very large data matrix saving../ training_data/
. To start your own training, replace these .mat
files by your own data. More details are in the Python scripts. You can try the training script by using these small examples.Run the Python script to train the CNN model with the proposed 2CL based on the prepared training/validation data:
python Mask-based_CNN_2CL_training.py
Run the Python script to train the CNN model with the proposed 3CL based on the prepared training/validation data:
python Mask-based_CNN_3CL_training.py
We also use Matlab to prepare the input magnitude spectra for test data and to store the phase information for the time-domain signal recovering.
test_input_noisy_speech.mat
(normalized noisy speech amplitude spectra, with zero mean and unit variance using the statistics collected on the training data)test_pure_noise.mat
(amplitude spectra of noise component, used to generate the filtered noise component, which can be used for white-box based performance measures)test_clean_speech.mat
(amplitude spectra of speech component, used to generate the filtered speech component, which can be used for white-box based performance measures)test_noisy_speech_unmorm.mat
(unnormalized noisy speech amplitude spectra, used for predicting enhanced speech).mat
files are stored using Matlab command save('filename.mat','variable')
, which allows to save maximum 2 GB .mat
file. If you have a very large test data, you also need to store .mat
files in -v7.3
, and to modify the corresponding data loading part in the test script../ test_data /
. To start your test, replace these .mat
files by your own data. More details are in the Python scripts. You can try the test script by using these small examples.test_n_tilde.mat.mat
(filtered noise amplitude spectra)test_s_tilde.mat
(filtered speech amplitude spectra)test_s_hat.mat
(enhanced speech amplitude spectra)Run the Python script to test the trained CNN model with the proposed 2CL using the prepared test data:
python Mask-based_CNN_2CL_test.py
Run the Python script to test the trained CNN model with the proposed 3CL using the prepared test data:
python Mask-based_CNN_3CL_test.py
The stored test data phase information is used to recover the time domain signal by IFFT with overlap add (OLA).
./Audio_demo/
.We also offer the corresponding audio demos in the format of .wav
files, and put them under the directory: ./Audio_demo_wav_file/
.
If you use the losses and/or scripts in your research, please cite
@article{xu2019Comploss,
author = {Z. Xu, S. Elshamy, Z. Zhao and T. Fingscheidt},
title = {{Components Loss for Neural Networks in Mask-Based Speech Enhancement}},
journal = {arXiv preprint arXiv: 1908.05087},
year = {2019},
month = {Aug.}
}