GLIGEN for layout-to-image synthesis does not seem to work.

joonjeon commented 6 months ago

Describe the bug

I attempted to execute the python script for GLIGEN-based layout-to-image synthesis, and it results in an error as follows:

AttributeError: 'UNet2DConditionModel' object has no attribute 'position_net'

Reproduction

The script to reproduce the bug is as follows:

from diffusers import StableDiffusionGLIGENPipeline
import torch
from PIL import Image

pipeline = StableDiffusionGLIGENPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16, variant="fp16"
).to("cuda")

image = pipeline(
    "urban traffic with numerous vehicles", height=1080, width=1920,
    gligen_phrases = ['vehicle', 'vehicle'],
    gligen_boxes = [ [0.0,0.09,0.33,0.76], [0.55,0.11,1.0,0.8] ],
).images[0]
image.save(f"image_sample.png")

Logs

Traceback (most recent call last):
  File "/home/joonjeon/diffusers-workspace/main.py", line 13, in <module>
    image = pipeline(
  File "/home/joonjeon/miniconda3/envs/pytorch/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/joonjeon/miniconda3/envs/pytorch/lib/python3.9/site-packages/diffusers/pipelines/stable_diffusion_gligen/pipeline_stable_diffusion_gligen.py", line 803, in __call__
    noise_pred = self.unet(
  File "/home/joonjeon/miniconda3/envs/pytorch/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/joonjeon/miniconda3/envs/pytorch/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/joonjeon/miniconda3/envs/pytorch/lib/python3.9/site-packages/diffusers/models/unets/unet_2d_condition.py", line 1175, in forward
    cross_attention_kwargs["gligen"] = {"objs": self.position_net(**gligen_args)}
  File "/home/joonjeon/miniconda3/envs/pytorch/lib/python3.9/site-packages/diffusers/models/modeling_utils.py", line 218, in __getattr__
    return super().__getattr__(name)
  File "/home/joonjeon/miniconda3/envs/pytorch/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1695, in __getattr__
    raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")
AttributeError: 'UNet2DConditionModel' object has no attribute 'position_net'

System Info

diffusers version: 0.27.2
Platform: Linux-6.5.0-18-generic-x86_64-with-glibc2.35
Python version: 3.9.18
PyTorch version (GPU?): 2.1.1+cu118 (True)
Huggingface_hub version: 0.22.2
Transformers version: 4.39.2
Accelerate version: 0.28.0
xFormers version: 0.0.23+cu118
Using GPU in script?:
Using distributed or parallel set-up in script?:

Who can help?

@DN6 @yiyixuxu @sayakpaul

tuanh123789 commented 6 months ago

you need to use this checkpoint instead of "runwayml/stable-diffusion-v1-5" https://huggingface.co/masterful/gligen-1-4-generation-text-box

DN6 commented 6 months ago

Hi @joonjeon As @tuanh123789 has pointed out, you need to change the checkpoint. You can refer to the GLIGEN docs page for compatible checkpoints with the pipeline https://huggingface.co/docs/diffusers/v0.27.2/en/api/pipelines/stable_diffusion/gligen#gligen-grounded-language-to-image-generation

joonjeon commented 6 months ago

Ahh... I see.

Now the pipeline seems to be working after using the alternative checkpoint instance :)

Thanks a lot!

huggingface / diffusers

GLIGEN for layout-to-image synthesis does not seem to work. #7589

Describe the bug

Reproduction

Logs

System Info

Who can help?