lucidrains / denoising-diffusion-pytorch

Implementation of Denoising Diffusion Probabilistic Model in Pytorch
MIT License
8.43k stars 1.04k forks source link

Sanity Check - Looking for a basic CIFAR10 hyperparameter set #331

Open samuelemarro opened 4 months ago

samuelemarro commented 4 months ago

I'm running the denoising_diffusion_pytorch.py script as-is on the CIFAR10 dataset, however the FID quickly plateaus to ~90, which is a far cry from both those reported in the DDIM/DDPM paper and even in other open issues (e.g. #326). Here are my hyperparameters:

model = Unet(
    dim = 64,
    dim_mults = (1, 2, 4, 8),
    flash_attn = True,
    dropout=0.1
)

diffusion = GaussianDiffusion(
    model,
    image_size = 32,
    timesteps = 1000,
    sampling_timesteps = 250
)

trainer = Trainer(
    diffusion,
    './data/cifar10',
    train_batch_size = 32,
    train_lr = 8e-5,
    train_num_steps = 100000,
    gradient_accumulate_every = 2,
    ema_decay = 0.995,
    num_fid_samples=500,
    save_and_sample_every=1000,
    amp = False,
    calculate_fid = True
)

No matter how I tune it, I can't seem to beat ~70. Am I going crazy? I feel like there's something obvious I'm missing, but I can't see what.

samuelemarro commented 4 months ago

Pinging in particular previous issue openers that reported FID scores/loss on CIFAR10 (@zzz313 @DavidXie03 @chengyiqiu1121), would be really grateful if you could take a look and see if there's something obviously wrong. Thank you!

samuelemarro commented 4 months ago

Pinging @lucidrains as well, hopefully if there's something I should obviously not be doing you probably have the best shot at noticing. Thanks!

chengyiqiu1121 commented 4 months ago

hi, the train_num_steps = 100000 is not enough. In my code, i set train_num_steps = 700000, and get FID around 20. Another thing is, this Unet of this package does not use dropout=0.1, which is mentioned in the original paper Denoising Diffusion Probabilistic Models, in Appendix B Experimental details.

chengyiqiu1121 commented 4 months ago

here is the unet config in my code, and after training, using DDIM sampler, the diffusion model gets FID 10.88

dataset_name: cifar10
lr: 2e-4
device: cuda:0
batch: 128
epoch: 700000
unet:
  dim: 128
  dim_mults: (1, 2, 2, 2)
  dropout: 0.1
samuelemarro commented 4 months ago

Thank you, I'll test it!

Maryeon commented 3 months ago

@samuelemarro Encountered the same problem as you, but I have found this codebase's implementation has some difference to the official implementation, such as the UNet strucuture (channel dim, multi-head or single-head attention), learning rate warmup. I am following this repo to reproduce the results on CIFAR10. Hope it will help.