Train code with 256x256 images

manurare commented 1 year ago

Hi,

First of all thanks for the code! I was wondering if you were able to make it converge on 256x256 images. Specifically I am using ImageNet (a smaller version of it with 9469 samples). I cannot quite make it generate plausible samples. I am using T=1000, cosine scheduler and LR=1e-4 with warm up. I tried both predicting epsilon and xstart but both give weird samples as I attach here. Do you have any tip/suggestion on what could be going wrong or how to improve sample quality? Maybe 1000 are not enough timesteps but then sampling would be much slower :S

Epsilon prediction	Xstart prediciton

Thanks!

sndnyang commented 1 year ago

Oh, I don't have enough computational resource for cifar10 to make it converge~~~ I never tried iDDPM on ImageNet(32,64,128,224,256)

sndnyang commented 1 year ago

For such high-resolution images, I think the better/faster method is stable diffusion / latent score matching? VQ-VAE + latent space diffusion https://github.com/CompVis/latent-diffusion or NVAE + latent score matching Score-based Generative Modeling in Latent Space http://arxiv.org/abs/2106.05931

Again, I can't train on ImageNet even training a classifier.

manurare commented 1 year ago

I see thanks. I was able to make it converge at 32x32 but I am not able to do it at 256x256

sndnyang / iDDPM

Train code with 256x256 images #3