Improving GANs for Speech Enhancement


This is the repository of the DSEGAN, ISEGAN, (and the baseline SEGAN) in our original paper:

H. Phan, I. V. McLoughlin, L. Pham, O. Y. Chén, P. Koch, M. De Vos, and A. Mertins, "Improving GANs for Speech Enhancement," IEEE Signal Processing Letters, 2020. (accepted)

ISEGAN (Iterated SEGAN) and DSEGAN (Deep SEGAN) were built upon the SEGAN proposed by Pascual et al. and SEGAN repository from santi-pdp. Different from SEGAN with a single generator, ISEGAN and DSEGAN have multiple generators which are chained to perform multi-stage enhancement mapping:


The enhacement result of one generator is supposed to be further enhanced/corrected by the next generator in the chain. DSEGAN's generators are independent while ISEGAN's generators share parameters. Similar to SEGAN, the generators are based on fully convolutional architecture and receive raw speech waveforms to accomplish speech enhancement:


The project is developed with TensorFlow.



The speech enhancement dataset used in the work can be found in Edinburgh DataShare. The following script downloads and prepares the data for TensorFlow format:


Or alternatively download the dataset, convert the wav files to 16kHz sampling and set the noisy and clean training files paths in the config file e2e_maker.cfg in cfg/. Then run the script:

python make_tfrecords.py --force-gen --cfg cfg/e2e_maker.cfg


Once you have the TFRecords file created in data/segan.tfrecords you can simply run one of the following scripts.

# ISEGAN: run inside isegan directory
# DSEGAN: run inside dsegan directory
# SEGAN baseline: run inside segan directory

Each script consists of commands for training and testing with 5 different checkpoints of the trained model on the test audio files with. You can modify the bash script to customize parameters (e.g. which GPUs to use) and what you want to run.

Enhancement results on two different test files:


Enhanced Wav Files

For comparison purpose, enhanced wave files of DSEGAN with depth of 2 are available at this here


  title={Improving GANs for Speech Enhancement},
  author={Huy Phan, Ian V. McLoughlin, Lam Pham, Oliver Y. Ch\'en, Philipp Koch, Maarten De Vos, and Alfred Mertins},
  journal={arXiv preprint arXiv:2001.05532},


e-mail: h.phan@qmul.ac.uk

Further things to add
