Closed tebin closed 3 years ago
Hi @tebin , sorry for the late response. There is no special reason for that. I just borrowed the idea from them and let the model learn in an expected way. But I think the method in the paper might work better somehow, and if you have a room for it, it would be nice if you can try them and share the results for others.
Your implementation has
diffusion_projection
for every residual block similar to DiffWave, but this is inconsistent with the paper as the original architecture directly adds E_t (output of the step embedding module) to the input before the first convolution layer. Is there a reason behind this change?