keonlee9420 / DiffSinger

PyTorch implementation of DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism (focused on DiffSpeech)
MIT License
230 stars 30 forks source link

diffusion_projection in ResidualBlock #3

Closed tebin closed 3 years ago

tebin commented 3 years ago

Your implementation has diffusion_projection for every residual block similar to DiffWave, but this is inconsistent with the paper as the original architecture directly adds E_t (output of the step embedding module) to the input before the first convolution layer. Is there a reason behind this change?

keonlee9420 commented 3 years ago

Hi @tebin , sorry for the late response. There is no special reason for that. I just borrowed the idea from them and let the model learn in an expected way. But I think the method in the paper might work better somehow, and if you have a room for it, it would be nice if you can try them and share the results for others.