openai / guided-diffusion

MIT License
5.9k stars 782 forks source link

How should I adjust the hyperparameters to train a strong DDPM model based on CelebA? #133

Open LinWeiJeff opened 7 months ago

LinWeiJeff commented 7 months ago

I want to use this github repository to train a DDPM model based on CelebA dataset consisting of 202,599 align&cropped human face images (each image is 218(height)x178(width) pixels) for human face image inpainting task.

Is there anyone who can give me some suggestions about how to adjust the hyperparameters used in this repository to train a DDPM model as strong as 256x256 diffusion (not class conditional) provided by this repository? I'm very curious about why the size of 256x256 diffusion (not class conditional) is so big (about 2.1GB) and how it is trained. By the way, for dataset, should I need to resize the images in my dataset to 256x256 firstly?

I hope the trained DDPM model learns the features of human face well so it can be used for human face image synthesis task and human face image inpainting task (i.e. recovering the masked parts of a masked human face image).

I want to know how to adjust the values of the hyperparameters in diffusion_defaults() of script_util.py, model_and_diffusion_defaults() of script_util.py and create_argparser() of image_train.py to train a strong DDPM model based on my CelebA dataset for human face image inpainting task.

I have tried some combinations of the hyperparameters used in guided-diffusion to train, however, the human face image inpainting results of the saved model files ema_0.9999_XXX.pt and modelXXX.pt are both bad. I mainly used RePaint to perform sampling for human face image inpainting task, as described in its README, it used guided-diffusion to train models, and one of the pretrained models: celeba256_250000.pt (it is downloaded from download.sh) that RePaint used to perform sampling for human face image inpainting task is big (about 2.1GB) and its sampling result is not bad. However, I don't know why the size of celeba256_250000.pt is so big and how it is trained, either.

In addition, because of the limitation of my GPU memory, I set the value of the hyperparameter num_channels only 64, I want to know if this hyperparameter affects the performance of the traind DDPM model. Should I try to set it larger?

In conclusion, I hope somebody can give me some suggestions about how to adjust the hyperparameters used in guided-diffusion to train my own DDPM model as strong as 256x256 diffusion (not class conditional) or celeba256_250000.pt for human face image synthesis and inpainting task.

Thanks a lot for anyone's help!!! p.s. I directly and manually set up the values of the hyperparameters in the codes of guided-diffusion not through any flag.