Right set of UNet hyperparameters when training DDPM

huggingface / diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.

https://huggingface.co/docs/diffusers

Apache License 2.0

26.4k stars 5.43k forks source link

Right set of UNet hyperparameters when training DDPM #1318

Closed leopoldmaillard closed 1 year ago

leopoldmaillard commented 2 years ago

Hi there ! I am currently training a DDPM model on a custom image dataset following the cool unconditional_image_generation example script.

Since I don't have the compute to perform comprehensive hyperparameter tuning of my architecture, I was wondering if there are any common intuitions when designing the UNet denoiser : width/length of the residual blocks, number and positions of the attention blocks, etc. with respect to the number of samples in the training set as well as their resolution.

If anyone has a wide experience in training DMs, it would be super cool to share insights here or in a dedicated blog post such as the one discussing the hyperparameters choice when training Dreambooth.

Thank you ! 🤗

patrickvonplaten commented 2 years ago

Maybe cc @anton-l ?

leopoldmaillard commented 2 years ago

I guess this problem will be further addressed in the upcoming unit 2 Fine-Tuning and Guidance of the HF Diffusion Course !

Also, @lewtun mentioned "When dealing with higher-resolution inputs you may want to use more down and up-blocks, and keep the attention layers only at the lowest resolution (bottom) layers to reduce memory usage." in the course's introductory notebook.

patrickvonplaten commented 1 year ago

cc @anton-l again here

anton-l commented 1 year ago

Hi @leopoldmaillard! I haven't explored the DDPM hyperparameters extensively yet, so can't recommend anything concrete for resolutions higher than 64x64. But as a first step I would adjust the number of up/down blocks in a way that would leave you with depth*16*16 or depth*8*8 features for the middle block of the UNet. The configs of some pretrained DDPM models at https://huggingface.co/google might give you some inspiration: https://huggingface.co/google/ddpm-church-256/blob/main/config.json

leopoldmaillard commented 1 year ago

Hello @anton-l, thank you for your insight !

I also found out that Dhariwal & Nichol discussed hyperparameters tuning of DDPM in their paper Diffusion Models Beat GANs on Image Synthesis.

Will close this for now !