sp-uhh / sgmse

Score-based Generative Models (Diffusion Models) for Speech Enhancement and Dereverberation
MIT License
521 stars 76 forks source link

train with higher sampling rate #26

Closed taalua closed 5 months ago

taalua commented 1 year ago

Thank you for the work. Can I use the base code to train for higher sampling rate directly? what modification to the model is needed to train with a higher sampling rate?

Thanks.

julius-richter commented 5 months ago

Hi taalua,

we have recently released a model trained on 48 kHz using the EARS-WHAM dataset [1].

We trained the model with python train.py --backbone ncsnpp_48k --spec_factor 0.065 --spec_abs_exponent 0.667 --sigma-min 0.1 --sigma-max 1.0 --theta 2.0.

You can also download the pre-trained checkpoints here.

[1] Julius Richter, Yi-Chiao Wu, Steven Krenn, Simon Welker, Bunlong Lay, Shinji Watanabe, Alexander Richard, Timo Gerkmann. "EARS: An Anechoic Fullband Speech Dataset Benchmarked for Speech Enhancement and Dereverberation", ISCA Interspeech, Kos, Greece, 2024.