sp-uhh / storm

StoRM: A Diffusion-based Stochastic Regeneration Model for Speech Enhancement and Dereverberation
MIT License
164 stars 22 forks source link

StoRM: A Stochastic Regeneration Model for Speech Enhancement and Dereverberation

StoRM inference process on a spectrogram. A predictive model is first used to get a estimate of the clean speech, with some possible distortions and resiudla noise. The diffusion generative model then uses this estimate as the initial point for a reverse process learns to generate clean speech in an iterative fashion starting from the corrupted signal xT.

This repository contains the official PyTorch implementation for the paper:

Audio examples and supplementary materials are available on our project page.

Installation

Pretrained checkpoints

Usage:

Training

Training is done by executing train.py. A minimal running example with default settings (as in our paper [2]) can be run with

python train.py --format <your_format>--base_dir <your_base_dir> --gpus 0,

where

To see all available training options, run python train.py --help. These include options for the backbone DNN, the SDE parameters, the PytorchLightning Trainer usual parameters such as max_epochs, limit_train_batches and so on.

Note:

Evaluation

To evaluate on a test set, run

python enhancement.py --test_dir <your_test_dir> --enhanced_dir <your_enhanced_dir> --ckpt <path_to_model_checkpoint>

to generate the enhanced .wav files. The --cpkt parameter of enhancement.py should be the path to a trained model checkpoint, as stored by the logger in logs/.

Data Creation

    cd preprocessing;
    python3 create_data.py --task <your_task> --speech <your_speech_format> --noise <your_noise_data>

Please check the script for other options

Citations / References

We kindly ask you to cite our papers in your publication when using any of our research or code:

@article{lemercier2023storm,
  author={Lemercier, Jean-Marie and Richter, Julius and Welker, Simon and Gerkmann, Timo},
  journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing}, 
  title={StoRM: A Diffusion-Based Stochastic Regeneration Model for Speech Enhancement and Dereverberation}, 
  year={2023},
  volume={31},
  number={},
  pages={2724-2737},
  doi={10.1109/TASLP.2023.3294692}}
@inproceedings{lemercier2023wind,
  author={Lemercier, Jean-Marie and Thiemannm, Joachim and Konig, Raphael and Gerkmann, Timo},
  booktitle={VDE 15th ITG conference on Speech Communication}, 
  title={Wind Noise Reduction with a Diffusion-based Stochastic Regeneration Model}, 
  year={2023}}

[1] Jean-Marie Lemercier, Julius Richter, Simon Welker, and Timo Gerkmann. "StoRM: A Stochastic Regeneration Model for Speech Enhancement And Dereverberation", IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 31, pp. 2724-2737, 2023.

[2] Julius Richter, Simon Welker, Jean-Marie Lemercier, Bunlong Lay and Timo Gerkmann. "Speech Enhancement and Dereverberation with Diffusion-Based Generative Models", IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 31, pp. 2351-2364, 2023.

[3] Jean-Marie Lemercier, Joachim Thiemann, Raphael Konig and Timo Gerkmann. "Wind Noise Reduction with a Diffusion-based Stochastic Regeneration Model", ITG Speech Communication, Aachen, Germany, 2023

[4] D. Mirabilii et al. "Simulating wind noise with airflow speed-dependent characteristics,” Int. Workshop on Acoustic Signal Enhancement, Aachen, Germany, 2022