zhibinQiu / SRTNet

SRTNet
24 stars 2 forks source link

Confusion about the residual modeling #2

Open zzwei1 opened 1 year ago

zzwei1 commented 1 year ago

Nice work !

In the paper, the residual clean speech x_0 and the residual noisy speech y_0 are adopt for the input of the stochastic model S_θ.

However, in the CVPR2022 paper 'Deblurring via Stochastic Refinement', I find that for the stochastic model, they use a blurry image y and the clean residual x_0 - gθ(x_0) as input, where the x_0 is the clean image and gθ(·) is the deterministic model.

Here comes my confusion. You use the residual noisy speech y_0 as the condition of the diffusion model, while the CVPR paper directly adopts the blurry image y as the condition. Since the diffusion is processed for the residual, I think your solution is more straightforward.

I'm not sure if my understanding is correct, and I would like to hear your insights.

image image image

zhibinQiu commented 1 year ago

Hi @zzwei1 Your understanding is very correct, one thing to add is that adding noisy to the diffusion process and just using noisy as a condition for the diffusion model will bring different results, I don't know if you are interested in speech enhancement, I will put up a comparison audio of the two generated effects after a while. I am preparing a follow up article, thanks for your attention.