sp-uhh / sgmse

Score-based Generative Models (Diffusion Models) for Speech Enhancement and Dereverberation
MIT License
454 stars 69 forks source link

I can not reproduce enhanced audio for same noisy audio file. Question : sampler() ? #16

Closed ndisci closed 7 months ago

ndisci commented 1 year ago

Hi,

Thank you for sharing your codes. I want to denoise a couple of audio files using sgmse-dereverberation model without any training involved. But when I checked generated tensors I realized that sgmse-dereverberation model produced different values for same audios each time because of sampler step. I'm curious about sampler().

Also the model is for 16 kHz sampling rate, right ?

Thanks.

julius-richter commented 1 year ago

Hi there,

  1. That is true, every time you run inference with SGMSE+, the method will generate slightly different clean speech estimations. This is due to the stochastic nature of the reverse diffusion process. To be precise, there are three sources of randomness. First, you start with an initial random sample $\mathbf x_T$. Second, the Brownian motion in the reverse process which is handled in the predictor step. And third, the annealed Langevin dynamics in the corrector step. To always generate the same estimate, you would need to set a manual seed, e.g. torch.manual_seed(0).
  2. Yes, all pre-trained models were trained on audio files with 16 kHz sampling rate.

I hope this answers your questions.

ndisci commented 1 year ago

Hi again,

Thank you !

julius-richter commented 7 months ago