DCCRN(Deep Complex Convolutional Recurrent Network) is one of the deep neaural networks proposed at [1]. This repository is an application using DCCRN with various loss functions. Our original paper can be found here, and you can check test samples here. Test samples are randomly choosed and we uploaded samples about SI-SNR and SI-SNR+LMS.
Source of the figure: paper
We use two base loss functions and two perceptual loss functions.
Base loss
- MSE: Mean Squred Error
Perceptual loss
- LMS: Log Mel Spectra
We combined 2 types of base loss functons and 2 types of perceptual loss functions. The coupling constant ratio was determined experimentally. For example, in the case of MSE, which is the basic loss function, the initial size is about 0.001 ~ 0.002, whereas the LMS has an initial size of 0.1 ~ 0.2 and PMSQE is about 0.8 ~ 1.3. Therefore, to combine the two terms to be of similar size, a smaller coefficient was used in the perceptual based loss function term. The coupling constant ratio is a result of reflecting the dynamic range of the two terms rather than reflecting the sensitivity of the two terms. Meanwhile, in the course of the experiment, we determined that the basic loss function is a more important term, so we changed the coefficients so that the dynamic range ratio including the coupling constant could be adjusted from 1:1 to 10:1, respectively.
This repository is tested on Ubuntu 20.04.
- Python 3.7+
- Cuda 10.1+
- CuDNN 7+
- Pytorch 1.7+
Library
- tqdm
- asteroid
- scipy
- matplotlib
- tensorboardX
- pesq
- pystoi
The training and validation data consist of the following three dimensions.
[Batch size, 2(input & target), wav length]
The test data consists of the following dimensions.
[noise type, dB classes, Batch size, 2(input & target), wav length]
We use 2 type of noise, seen and unseen and 7 dB classes from -10dB to 20dB.
We cut the wav files longer than 3 seconds into 3 seconds and zero padded for wav files shorter than 3 seconds.
The sampling frequency is 16k.