huawei-noah / Speech-Backbones

This is the main repository of open-sourced speech technology by Huawei Noah's Ark Lab.
545 stars 113 forks source link

Different Implementation of Diffusion Model #35

Open siyag12 opened 8 months ago

siyag12 commented 8 months ago

I'm a researcher working on building a TTS model using diffusion. While looking for the implementation of this, I found this repo.

According to my understanding of the paper, both the processes in the decoder diffusion model, forward and backward diffusion are supposed to take place on the latent space vector z [which is provided by UNET encoder part]. However, the repo's implementation seems to be different from this understanding. Could you give a reasoning behind this?