Open black0017 opened 1 week ago
Hi! Thank you for your interest in our work.
If you have further questions, feel free to let me know!
Thank you so much for the clarifications! I 've missed that from the paper and appendix.
It takes a lot of time and resources to compute the ADM samples at 256 resolution. I estimate that it will take more than 48h in 4x Nvidia a100 GPUs. Is this to be expected?
DDIM-25
, which I assume is specified in --timestep_respacing ddim25
or DDPM-250 (--timestep_respacing 250
) ?--guide_scale 1
from the paper's formulation is the one I need to specify for the ADM samples.Thanks again and have a nice day! I will let you know if I am able to get similar FID values!
Another thing: Do you provide any results to reproduce your results on the ADM 128x128 cond. model??
Do you think I can still use the same attention maps ( the unet architecture is slightly different)? Please let me know if you have these experiments (PAG on ADM128x128 cond model) as I am a bit constrained with the amount of experiments I can run. Thanks!
Hi, I missed the comments. I'm sorry for the late reply.
It takes a lot of time and resources to compute the ADM samples at 256 resolution. I estimate that it will take more than 48h in 4x Nvidia a100 GPUs. Is this to be expected?
Yes, we evaluated FID using an 8x Nvidia 3090 GPU for about 52 hours. It was a very tough time.
For the reported results (table 1), are you using DDIM-25, which I assume is specified in --timestep_respacing ddim25 or DDPM-250 (--timestep_respacing 250 ) ?
We used DDPM-250 to ensure the same setting with SAG.
I guess --guide_scale 1 from the paper's formulation is the one I need to specify for the ADM samples.
We used --guidance_scale 2.0
, as our codebase has a guidance scale which starts from 1.0. (0.0 = uncond, 1.0 = cond, please refer to gaussian_diffusion/gaussian_diffusion.py
.
I hope you can achieve the same results! If you have more specification, feel free to tell us.
Another thing: Do you provide any results to reproduce your results on the ADM 128x128 cond. model??
We tested it on ImageNet 128 in ealier days. It works quite well. But we didn't report this because ADM does not have an unconditional model.
It has a slightly different architecture than 256 models, with fewer attention layers. But overall architecture is similar, and it will work well if you perturb near the m
layers, for example, i13, i14, m, o2, o5, o6
.
I attached results from ImageNet 128 for reference, although it is a dropout on a self-attention map, not the replacing self-attention map to an identity matrix. The identity matrix works even better.
So I suggest you try PAG on the 128x128 model, which can reduce the total evaluation time by a large margin.
Hey @sunovivid, great work, and congrats on the paper's acceptance at ECCV!
I would like to reproduce the results, and I have the following questions related to the hyperparameters:
How many samples did you use to report FID? The bash script shows 5K, is that right?
What is the guidance scale to get the optimal FID, as shown in the paper's Table 1? Readme shows the following (without classifier guidance here for the conditional ADM case at 256x256):
Can I apply the same code for the ImageNet 128x128 models and resolution, or do I need to specify other attention maps?
It would be extremely helpful as sampling is super slow with ADM. Thanks a lot!!!!