haoheliu / voicefixer_main

General Speech Restoration
https://haoheliu.github.io/demopage-voicefixer/
MIT License
276 stars 56 forks source link
machine-learning speech speech-analysis speech-enhancement speech-processing speech-synthesis speech-to-text tts

arXiv Open In Colab PyPI version githubio

2021-11-06: I have just updated the code structure to make it easier to understand. It may have potential bug now. I will do some test training later.

2021-11-01: I will update the code and make it easier to use later.

VoiceFixer

VoiceFixer is a framework for general speech restoration. We aim at the restoration of severely degraded speech and historical speech.

Materials

Usage

Environment (Do this at first)

# Download dataset and prepare running environment
git clone https://github.com/haoheliu/voicefixer_main.git
cd voicefixer_main
source init.sh 

VoiceFixer for general speech restoration

Here we take VF_UNet(voicefixer with unet as analysis module) as an example.

Automatic evaluation and generating .csv file on all testsets.

For example, if you like to evaluate on all testset (default).

python3 eval_gsr_voicefixer.py  \
                    --config  <path-to-the-config-file> \
                    --ckpt  <path-to-the-checkpoint> 

For example, if you just wanna evaluate on GSR testset.

python3 eval_gsr_voicefixer.py  
                    --config  <path-to-the-config-file> \
                    --ckpt  <path-to-the-checkpoint> \
                    --testset  general_speech_restoration \ 
                    --description  general_speech_restoration_eval 

There are generally seven testsets you can pass to --testset:

And if you would like to evaluate on a small portion of data, e.g. 10 utterance. You can pass the number to --limit_numbers argument.

python3 eval_gsr_voicefixer.py  \
                    --config  <path-to-the-config-file> \
                    --ckpt  <path-to-the-checkpoint> \
                    --limit_numbers 10 

Evaluation results will be presented in the exp_results folder.

ResUNet for general speech restoration

ResUNet for single task speech restoration

You can checkout the logs directory for checkpoints, logging and validation results.

Citation

 @misc{liu2021voicefixer,   
     title={VoiceFixer: Toward General Speech Restoration With Neural Vocoder},   
     author={Haohe Liu and Qiuqiang Kong and Qiao Tian and Yan Zhao and DeLiang Wang and Chuanzeng Huang and Yuxuan Wang},  
     year={2021},  
     eprint={2109.13731},  
     archivePrefix={arXiv},  
     primaryClass={cs.SD}  
 }

real-life-example real-life-example real-life-example