First of all, thanks for open-sourcing your implementation. Where ever you look the base implementation is OpenAI's guided diffusion, which is great!
I was going over the code for a personal project, and I understood that model config is preprocessed using the following code in script_util.py:
for res in attention_resolutions.split(","):
attention_ds.append(image_size // int(res))
However in the unet.py, attention_resolutions is defined as:
a collection of downsample rates at which attention will take place. May be a set, list, or tuple. For example, if this contains 4, then at 4x downsampling, attention will be used.
Which means the implementation is independent of the image resolution, which totally makes sense.
The only thing that needs to be changed to fix this discrepancy is to change the code snippet above to:
for res in attention_resolutions.split(","):
attention_ds.append(int(res))
I would be more than happy to submit a PR, but first wanted to bring this to your attention and seek your opinion.
First of all, thanks for open-sourcing your implementation. Where ever you look the base implementation is OpenAI's guided diffusion, which is great!
I was going over the code for a personal project, and I understood that model config is preprocessed using the following code in script_util.py:
However in the unet.py,
attention_resolutions
is defined as:Which means the implementation is independent of the image resolution, which totally makes sense.
The only thing that needs to be changed to fix this discrepancy is to change the code snippet above to:
I would be more than happy to submit a PR, but first wanted to bring this to your attention and seek your opinion.