ifnspaml / Enhancement-Coded-Speech

23 stars 14 forks source link
convolutional-neural-networks speech-codec speech-enhancement

Enhancement-Coded-Speech

Please find here the scripts referring to the paper Convolutional Neural Networks to Enhance Coded Speech. In this repository we provide the cepstral domain approach with the framework structure III.

The code was written by Ziyue Zhao and Huijun Liu.

LATEST

Some Python code is updated to match the TensorFlow 2 (the original code was written for TensorFlow 1). See Prerequisites for detailed information about how to start.

Introduction

An approach based on a convolutional neural network (CNN) is proposed to enhance coded (i.e., encoded and decoded) speech by utilizing cepstral domain features. The quality of coded speech can be enhanced and thus achieves improved quality without modifing the codec (i.e., encoder and decoder) itself.

Prerequisites and Installation

Getting Started

Testing with the provided CNN model

The results reported in the paper is tested on the NTT wideband speech database, so if you want to reproduce the exact results, the test need to be done with the same speech data (see details in the paper).

Training with your own dataset

Codecs and processing functions

Citation

If you use the scripts in your research, please cite

@article{zhao2019convolutional,
  author = {Z. Zhao and H. Liu and T. Fingscheidt},
  title = {{Convolutional Neural Networks to Enhance Coded Speech}},
  journal = {IEEE/ACM Transactions on Audio, Speech, and Language Processing},
  year = {2019},
  month = april,
  volume = {27}, 
  number = {4},
  pages = {663-678}
}
@article{cnn2codedspeech,
  author =  {Z. Zhao and H. Liu and T. Fingscheidt},
  title =   {{Convolutional Neural Networks to Enhance Coded Speech}},
  howpublished = {\url{https://github.com/ifnspaml/Enhancement-Coded-Speech}},
  year =    {2018},
  month =   jun
}

Acknowledgements