Open adelacvg opened 2 years ago
that's interesting, I'm not sure
there is a subtle difference in the resnet blocks. I'm using the GLIDE style architecture here with norm, activation, then project
However, the original ddpm does project, norm, activation
I've also added weight standardization in the ddpm pytorch repo since it reportedly works well with group norms, so it could be that too
@lucidrains @adelacvg Is this because in the pointed DDPM implementation, the UNnet has default attention layers every level while in this repo, you have to point to which layer should have attention layers in arguments.
@lucidrains Have you evalute how these two repos perform on unconditional generation tasks? I appreciate your contributions very much and hope get more hints on this.
I am training the unconditional version of imagen, which I guess is the same as continuous time Gaussian diffusion. But I found that training on the unconditional imagen is much slower than continuous_time_gaussian_diffusion. Both are trained on the same dataset and devices. However, the imagen version is 5-10 times slower. Trained in the same steps, imagen gives worse result than continuous_time_gaussian_diffusion. I would like it to produce comparable results in the same amount of time. How should I configure the network correctly?