VoiceRestore is a cutting-edge speech restoration model designed to significantly enhance the quality of degraded voice recordings. Leveraging flow-matching transformers, this model excels at addressing a wide range of audio imperfections commonly found in speech, including background noise, reverberation, distortion, and signal loss.
Demo of audio restorations: VoiceRestore
Credits: This repository is based on the E2-TTS implementation by Lucidrains
Degraded audio (reverberation, distortion, noise, random cut):
Note: Adjust your volume before playing the degraded audio sample, as it may contain distortions.
https://github.com/user-attachments/assets/0c030274-60b5-41a4-abe6-59a3f1bc934b
Restored audio - 16 steps, strength 0.5:
https://github.com/user-attachments/assets/fdbbb988-9bd2-4750-bddd-32bd5153d254
Clone the repository:
git clone --recurse-submodules https://github.com/skirdey/voicerestore.git
cd VoiceRestore
if you did not clone with --recurse-submodules
, you can run:
git submodule update --init --recursive
Install dependencies:
pip install -r requirements.txt
Download the pre-trained model and place it in the checkpoints
folder. (Updated 9/29/2024)
Run a test restoration:
python inference_short.py --checkpoint ./checkpoints/voice-restore-20d-16h-optim.pt --input test_input.wav --output test_output.wav --steps 32 --cfg_strength 0.5
This will process test_input.wav
and save the result as test_output.wav
.
Run a long form restoration, it uses window chunking:
python inference_long.py --checkpoint ./checkpoints/voice-restore-20d-16h-optim.pt --input test_input_long.wav --output test_output_long.wav --steps 32 --cfg_strength 0.5 --window_size_sec 10.0 --overlap 0.25
This will process test_input_long.wav
(you need to provide it) and save the result as test_output_long.wav
.
To restore your own audio files:
from model import OptimizedAudioRestorationModel
model = OptimizedAudioRestorationModel()
restored_audio = model.forward(input_audio, steps=32, cfg_strength=0.5)
!git lfs install
!git clone https://huggingface.co/jadechoghari/VoiceRestore
%cd VoiceRestore
!pip install -r requirements.txt
from transformers import AutoModel
# path to the model folder (on colab it's as follows)
checkpoint_path = "/content/VoiceRestore"
model = AutoModel.from_pretrained(checkpoint_path, trust_remote_code=True)
model("test_input.wav", "test_output.wav")
If you use VoiceRestore in your research, please cite our paper:
@article{kirdey2024voicerestore,
title={VoiceRestore: Flow-Matching Transformers for Speech Recording Quality Restoration},
author={Kirdey, Stanislav},
journal={arXiv},
year={2024}
}
This project is licensed under the MIT License - see the LICENSE file for details.