I can not reproduce enhanced audio for same noisy audio file. Question : sampler() ?

ndisci commented 1 year ago

Hi,

Thank you for sharing your codes. I want to denoise a couple of audio files using sgmse-dereverberation model without any training involved. But when I checked generated tensors I realized that sgmse-dereverberation model produced different values for same audios each time because of sampler step. I'm curious about sampler().

Also the model is for 16 kHz sampling rate, right ?

Thanks.

julius-richter commented 1 year ago

Hi there,

That is true, every time you run inference with SGMSE+, the method will generate slightly different clean speech estimations. This is due to the stochastic nature of the reverse diffusion process. To be precise, there are three sources of randomness. First, you start with an initial random sample $\mathbf x_T$. Second, the Brownian motion in the reverse process which is handled in the predictor step. And third, the annealed Langevin dynamics in the corrector step. To always generate the same estimate, you would need to set a manual seed, e.g. torch.manual_seed(0).
Yes, all pre-trained models were trained on audio files with 16 kHz sampling rate.

I hope this answers your questions.

ndisci commented 1 year ago

Hi again,

I tried your suggestion (torch.manual_seed(0)) about 4 months ago and it worked but how about different pc's ?
How can I use DCUNet as the score model instead of NCSN++ for test model performance ? Is there any checkpoint for this purpose ?

Thank you !

julius-richter commented 9 months ago

torch.manual_seed(0) should also work if it runs on different computers.
We only offer checkpoints for SGMSE+ (NCSN++ backbone) and not for SGMSE (DCUNet backbone).

sp-uhh / sgmse

I can not reproduce enhanced audio for same noisy audio file. Question : sampler() ? #16