Thank you for sharing your codes. I want to denoise a couple of audio files using sgmse-dereverberation model without any training involved. But when I checked generated tensors I realized that sgmse-dereverberation model produced different values for same audios each time because of sampler step. I'm curious about sampler().
Also the model is for 16 kHz sampling rate, right ?
That is true, every time you run inference with SGMSE+, the method will generate slightly different clean speech estimations. This is due to the stochastic nature of the reverse diffusion process. To be precise, there are three sources of randomness. First, you start with an initial random sample $\mathbf x_T$. Second, the Brownian motion in the reverse process which is handled in the predictor step. And third, the annealed Langevin dynamics in the corrector step. To always generate the same estimate, you would need to set a manual seed, e.g. torch.manual_seed(0).
Yes, all pre-trained models were trained on audio files with 16 kHz sampling rate.
Hi,
Thank you for sharing your codes. I want to denoise a couple of audio files using sgmse-dereverberation model without any training involved. But when I checked generated tensors I realized that sgmse-dereverberation model produced different values for same audios each time because of sampler step. I'm curious about sampler().
Also the model is for 16 kHz sampling rate, right ?
Thanks.